feat: implement multi-credential priority call design#2468
Conversation
PR Reviewer Guide 🔍(Review updated until commit 159b88c)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 159b88c
Previous suggestionsSuggestions up to commit 2cefa7a
|
64ad373 to
44449b7
Compare
Add OrderedCredentialSwitcher for N-credential failover, MultiCredentialVLM and FailoverEmbedder for credential switching, support automatic migration from legacy backup config
Separate the monolithic PERMANENT error class so the switcher reacts to the actual root cause: - 400 (request-level parameter error) -> PERMANENT, fail-fast: same request fails on every credential of the same model. - 401/403/unauthorized/accountoverdue -> new AUTH class: credential-level, advances to the next credential in multi-credential mode. - new CONTENT_SAFETY class (moderation rejections) and INPUT_TOO_LARGE -> fail-fast: switching credentials cannot help. classify_api_error now checks CONTENT_SAFETY before PERMANENT so a moderation message containing "400" is not misclassified. On fail-fast the failover wrappers re-raise the original exception (preserving type/info) instead of wrapping it, so callers can react (e.g. truncate on input_too_large). AllCredentialsFailedError is reserved for chain exhaustion. The legacy PrimaryBackupSwitcher also switches on AUTH to keep existing backup behavior.
get_active_index() previously mutated state: when a failback threshold was met it would decrement the active index. Because observability properties (active_credential_index / active_credential_id) call it, merely reading the current credential for logging or metrics could accidentally advance the failback state machine. Split the concern: get_active_index() is now a pure read, and a new public maybe_failback() performs the one-step failback (and logs an info line when the active credential index changes). The request loops in MultiCredentialVLM and FailoverEmbedder call maybe_failback() at the top of each attempt, so the failback behavior is unchanged while pure reads no longer have side effects.
The failover loops exited on `idx >= n OR total_attempts >= total_max_retries` (default 10). With more than 10 credentials, or when failback churn inflated the attempt count, this could raise AllCredentialsFailedError before every credential had actually been tried, leaving lower-priority credentials unused. Remove total_max_retries entirely from MultiCredentialVLM and FailoverEmbedder: credential exhaustion is now decided solely by reaching the end of the chain (idx >= n), and per-credential retries remain the responsibility of each underlying instance via its own max_retries. The aggregated error tuple now records the failing credential index instead of the attempt counter. Adds a regression test covering more than 10 credentials all being tried.
encoding_format, model_path, cache_dir, enable_fusion, res_level and max_video_frames describe how a model runs, not which credential is used. All credentials of a single embedding model share the same model, so these belong on the parent EmbeddingModelConfig, not on each credential. Keeping them on the credential forced a `cred.X or config.X` merge in _create_failover_embedder, which silently dropped explicit falsy values (enable_fusion=False, res_level=0, max_video_frames=0) and fell back to the parent value. Remove these six fields from EmbeddingCredential and read them directly from the parent config when building per-credential embedders. id/provider/model/ api_key/api_base/api_version/ak/sk/region/host/extra_headers remain credential-level.
FailoverEmbedder.get_dimension() returned a hardcoded 2048 when the first embedder had no get_dimension(). That path is reached only when wrapping sparse embedders, which have no fixed dense dimension; returning a fabricated 2048 silently feeds a wrong dimension to callers (e.g. schema creation). Delegate to the first embedder and raise AttributeError when it has no get_dimension(), surfacing the misuse instead of hiding it.
Two issues in the cross-instance token usage merge: 1. Encapsulation: FailoverVLM / MultiCredentialVLM / FailoverEmbedder reached into other instances' private _token_tracker. Add a public token_tracker accessor on VLMBase and use it in the VLM mergers. 2. Double counting in FailoverEmbedder: embedders share a process-wide singleton token tracker (_get_token_tracker), so all wrapped embedders point at the same object. Merging N identical trackers inflated usage N-fold. Return a single instance's usage directly instead of merging.
Splitting 401/403/unauthorized/accountoverdue out of PERMANENT into the new AUTH class (commit b477b9f) regressed the circuit breaker: it only tripped immediately on PERMANENT/QUOTA_EXCEEDED, so auth errors no longer opened the breaker right away. For a single embedding instance an auth failure (key invalid / no permission / overdue) is persistent and retrying is pointless, so the breaker should still trip immediately. Add ERROR_CLASS_AUTH to the immediate-trip set and update the classification tests to assert the new AUTH class (403 still trips the breaker).
743c3c6 to
a218db0
Compare
Without setting _last_switch_time, maybe_failback() retreats to idx 0 immediately because the default 0 timestamp is always older than the 600s timeout, so the unavailable last credential never actually gets exercised.
8cdead0 to
159b88c
Compare
|
Persistent review updated to latest commit 159b88c |
qin-ctx
left a comment
There was a problem hiding this comment.
本次 review 发现 3 个 blocking 问题,主要是新 credentials 归一化没有完整保留旧 providers 语义,以及 embedding credential 的 provider/model/auth fallback 在校验、维度、运行时构造之间不一致。
Non-blocking:
- PR 已注明 ov.conf 文档/示例待补;这是用户可见新配置格式,建议补 credentials 示例和旧 backup/providers 迁移说明。
- 当前 commit 中有多条
fix/format,建议合并前 squash 或改成可追踪的 conventional commits.
qin-ctx
left a comment
There was a problem hiding this comment.
本次 review 仍发现 1 个 blocking 问题:credentials 模式跳过了 embedding provider-specific 校验,会放行 litellm 未设置 dimension 等无效配置,导致 schema 维度和运行时 embedder 行为不一致。
Non-blocking:
- CI lint 当前失败:ruff format --check 提示 openviking/storage/collection_schemas.py 和 openviking_cli/utils/config/embedding_config.py 需要格式化。
- PR 已注明 ov.conf 文档/示例待补;这是用户可见的新配置格式,建议补 credentials 示例和旧 backup/providers 迁移说明。
- 当前 commit 中有多条 fix/format,建议合并前 squash 或改成可追踪的 conventional commits。
Description
为 OpenViking 客户端(ov.conf)增加多模型凭证 + 按序自动 fallback能力:同一个 VLM / Embedding 模型可配置一个扁平有序的凭证列表(数组顺序即调用优先级,每条带稳定 id),调用失败时按错误类型自动 fallback
到下一条凭证,并在高优先级凭证冷却后自动 failback。
Related Issue
Type of Change
[✓] New feature (non-breaking change that adds functionality)
[✓] Bug fix (non-breaking change that fixes an issue)
[✓] Refactoring (no functional changes)
Changes Made
核心特性
• 新增 OrderedCredentialSwitcher(N 级有序凭证切换状态机,含逐级 failback),MultiCredentialVLM 与 FailoverEmbedder 包装器;配置层 VLMConfig.credentials / EmbeddingModelConfig.credentials,旧 backup
字段自动归一化兼容。
• 鉴权方式从旧顶层配置切换为 credentials 时,模型/维度未变可无损复用已有向量(allow_metadata_override,默认 False,dimension 变化仍强制 rebuild)。
错误分类与切换策略
• 拆分错误类:PERMANENT(400 参数错误) / 新增 AUTH(401/403/欠费) / 新增 CONTENT_SAFETY / INPUT_TOO_LARGE / QUOTA_EXCEEDED / TRANSIENT / UNKNOWN。
• 切换策略:请求级错误(400 / 输入过大 / 内容安全)fail-fast 并 re-raise 原始异常;凭证级 AUTH 与 QUOTA_EXCEEDED 前移;TRANSIENT 用尽底层重试后前移;UNKNOWN 保守前移。
• circuit breaker 对 AUTH 同样立即熔断。
评审整改
• get_active_index() 改为纯读、无副作用;新增 maybe_failback() 专司 failback(切换时打 info 日志)。
• 移除全局 total_max_retries,凭证耗尽仅由 idx >= n 判定(修复 >10 凭证提前失败)。
• EmbeddingCredential 移除 6 个模型行为字段(修复 or 合并吞掉 0/False)。
• FailoverEmbedder.get_dimension 抛错替代魔法值 2048。
• token usage 改用公开访问器;修复 embedder 共享单例 tracker 导致的重复累加。
Testing
[✓] I have added tests that prove my fix is effective or that my feature works
[✓] New and existing unit tests pass locally with my changes
[✓] I have tested this on the following platforms:
[✓] macOS
测试:test_ordered_credential_switcher / test_vlm_failover / test_failover_embedder_permanent_advance / test_embedding_credential_model_override / test_model_retry / test_circuit_breaker,共 127 passed。
Checklist
[✓] My code follows the project's coding style
[✓] I have performed a self-review of my code
[✓] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation(ov.conf 文档/示例待补)
[✓] My changes generate no new warnings
[ ] Any dependent changes have been merged and published
Additional Notes
• 设计前提:同一模型的所有凭证指向同一逻辑模型(endpoint 可不同)。因此请求级错误换凭证无意义 → fail-fast;凭证级错误(key 失效/欠费)才前移。
• 每条凭证的瞬时重试由其底层实例的 max_retries 负责,无全局重试上限。
• 向后兼容:旧 backup、旧顶层鉴权配置均自动归一化为 credentials,存量配置零改动。