Feature/multi credential failover 0.3.x#2554
Open
fengluodb wants to merge 25 commits into
Open
Conversation
Add OrderedCredentialSwitcher for N-credential failover, MultiCredentialVLM and FailoverEmbedder for credential switching, support automatic migration from legacy backup config
Separate the monolithic PERMANENT error class so the switcher reacts to the actual root cause: - 400 (request-level parameter error) -> PERMANENT, fail-fast: same request fails on every credential of the same model. - 401/403/unauthorized/accountoverdue -> new AUTH class: credential-level, advances to the next credential in multi-credential mode. - new CONTENT_SAFETY class (moderation rejections) and INPUT_TOO_LARGE -> fail-fast: switching credentials cannot help. classify_api_error now checks CONTENT_SAFETY before PERMANENT so a moderation message containing "400" is not misclassified. On fail-fast the failover wrappers re-raise the original exception (preserving type/info) instead of wrapping it, so callers can react (e.g. truncate on input_too_large). AllCredentialsFailedError is reserved for chain exhaustion. The legacy PrimaryBackupSwitcher also switches on AUTH to keep existing backup behavior.
get_active_index() previously mutated state: when a failback threshold was met it would decrement the active index. Because observability properties (active_credential_index / active_credential_id) call it, merely reading the current credential for logging or metrics could accidentally advance the failback state machine. Split the concern: get_active_index() is now a pure read, and a new public maybe_failback() performs the one-step failback (and logs an info line when the active credential index changes). The request loops in MultiCredentialVLM and FailoverEmbedder call maybe_failback() at the top of each attempt, so the failback behavior is unchanged while pure reads no longer have side effects.
The failover loops exited on `idx >= n OR total_attempts >= total_max_retries` (default 10). With more than 10 credentials, or when failback churn inflated the attempt count, this could raise AllCredentialsFailedError before every credential had actually been tried, leaving lower-priority credentials unused. Remove total_max_retries entirely from MultiCredentialVLM and FailoverEmbedder: credential exhaustion is now decided solely by reaching the end of the chain (idx >= n), and per-credential retries remain the responsibility of each underlying instance via its own max_retries. The aggregated error tuple now records the failing credential index instead of the attempt counter. Adds a regression test covering more than 10 credentials all being tried.
encoding_format, model_path, cache_dir, enable_fusion, res_level and max_video_frames describe how a model runs, not which credential is used. All credentials of a single embedding model share the same model, so these belong on the parent EmbeddingModelConfig, not on each credential. Keeping them on the credential forced a `cred.X or config.X` merge in _create_failover_embedder, which silently dropped explicit falsy values (enable_fusion=False, res_level=0, max_video_frames=0) and fell back to the parent value. Remove these six fields from EmbeddingCredential and read them directly from the parent config when building per-credential embedders. id/provider/model/ api_key/api_base/api_version/ak/sk/region/host/extra_headers remain credential-level.
FailoverEmbedder.get_dimension() returned a hardcoded 2048 when the first embedder had no get_dimension(). That path is reached only when wrapping sparse embedders, which have no fixed dense dimension; returning a fabricated 2048 silently feeds a wrong dimension to callers (e.g. schema creation). Delegate to the first embedder and raise AttributeError when it has no get_dimension(), surfacing the misuse instead of hiding it.
Two issues in the cross-instance token usage merge: 1. Encapsulation: FailoverVLM / MultiCredentialVLM / FailoverEmbedder reached into other instances' private _token_tracker. Add a public token_tracker accessor on VLMBase and use it in the VLM mergers. 2. Double counting in FailoverEmbedder: embedders share a process-wide singleton token tracker (_get_token_tracker), so all wrapped embedders point at the same object. Merging N identical trackers inflated usage N-fold. Return a single instance's usage directly instead of merging.
Splitting 401/403/unauthorized/accountoverdue out of PERMANENT into the new AUTH class (commit b477b9f) regressed the circuit breaker: it only tripped immediately on PERMANENT/QUOTA_EXCEEDED, so auth errors no longer opened the breaker right away. For a single embedding instance an auth failure (key invalid / no permission / overdue) is persistent and retrying is pointless, so the breaker should still trip immediately. Add ERROR_CLASS_AUTH to the immediate-trip set and update the classification tests to assert the new AUTH class (403 still trips the breaker).
Without setting _last_switch_time, maybe_failback() retreats to idx 0 immediately because the default 0 timestamp is always older than the 600s timeout, so the unavailable last credential never actually gets exercised.
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Related Issue
Type of Change
Changes Made
Testing
Checklist
Screenshots (if applicable)
Additional Notes