Skip to content

[FEATURE] Provider abstraction with model discovery #1249

@nujragan93

Description

@nujragan93

📋 Prerequisites

📝 Feature Summary

Introduce a Provider abstraction with model discovery in Kagent, allowing operators to configure multiple providers and users to select a provider and discover available models dynamically during model creation.

❓ Problem Statement / Motivation

Current Limitations / Pain Points

  • Provider details are effectively static and managed through fixed configuration rather than being dynamically extensible.
  • Users cannot choose between multiple providers (e.g., OpenAI, Bedrock, internal vLLM).
  • Model creation today relies on a fixed set of stock model options exposed by the UI/CLI, rather than dynamically discovering models from the selected provider.

Who Is Affected

  • Platform / Infra teams managing Kagent as a shared service
  • Application teams consuming models across different providers
  • Operators who need RBAC, auditability, and safe multi-provider support

Why This Is Needed

Without a provider abstraction:

  • UX is poor and error-prone

💡 Proposed Solution

Option 1: Default Provider via ConfigMap and Kagent refering default apikey for models

Allow operators to define a default provider using a ConfigMap variable, with credentials supplied to the Kagent controller pod via a mounted Secret or environment variables, enabling Kagent to authenticate and fetch available models for simpler or single-provider setups.

# Example ConfigMap for configuring LLM providers with model discovery
#
# This ConfigMap defines the providers available for model discovery.
# The controller reads this ConfigMap and exposes the providers via the
# /api/providers/configured endpoint. Models can be discovered dynamically
# via /api/providers/configured/{name}/models.
#
# Prerequisites:
# - Create Secrets for each provider's API key (see examples below)
#
# Usage:
#   kubectl apply -f kagent-providers-configmap.yaml
#   kubectl apply -f provider-secrets.yaml  # Create secrets first
#
apiVersion: v1
kind: ConfigMap
metadata:
  name: kagent-providers
  namespace: kagent
  labels:
    app.kubernetes.io/name: kagent
    app.kubernetes.io/component: provider-config
data:
  # YAML list of provider configurations
  # Each provider has:
  #   - name: unique identifier for this provider instance
  #   - type: provider type (OpenAI, Anthropic, AzureOpenAI, Ollama, Gemini, etc.)
  #   - endpoint: base URL for the provider API
  #   - secretRef: reference to Kubernetes Secret containing API key
  providers: |
    - name: openai-prod
      type: OpenAI
      endpoint: https://api.openai.com/v1
      secretRef:
        name: openai-api-key
        key: apiKey

    - name: anthropic-prod
      type: Anthropic
      endpoint: https://api.anthropic.com
      secretRef:
        name: anthropic-api-key
        key: apiKey

    - name: azure-openai-eastus
      type: AzureOpenAI
      endpoint: https://my-resource.openai.azure.com
      secretRef:
        name: azure-openai-key
        key: apiKey

    - name: ollama-local
      type: Ollama
      endpoint: http://ollama.default.svc.cluster.local:11434
      secretRef:
        name: ollama-placeholder
        key: apiKey

    - name: litellm-gateway
      type: OpenAI
      endpoint: https://litellm.internal.company.com/v1
      secretRef:
        name: litellm-api-key
        key: apiKey
---
# Example Secrets for provider API keys
# IMPORTANT: In production, use sealed-secrets, external-secrets, or vault
# These examples use placeholder values - replace with real API keys
apiVersion: v1
kind: Secret
metadata:
  name: openai-api-key
  namespace: kagent
type: Opaque
stringData:
  apiKey: "sk-your-openai-api-key-here"
---
apiVersion: v1
kind: Secret
metadata:
  name: anthropic-api-key
  namespace: kagent
type: Opaque
stringData:
  apiKey: "sk-ant-your-anthropic-api-key-here"
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-openai-key
  namespace: kagent
type: Opaque
stringData:
  apiKey: "your-azure-openai-api-key-here"
---
apiVersion: v1
kind: Secret
metadata:
  name: ollama-placeholder
  namespace: kagent
type: Opaque
stringData:
  # Ollama doesn't require an API key, but we need a placeholder
  apiKey: "not-required"
---
apiVersion: v1
kind: Secret
metadata:
  name: litellm-api-key
  namespace: kagent
type: Opaque
stringData:
  apiKey: "your-litellm-master-key-here"

Option 2. Provider as a First-Class Resource (Option B)

Introduce a Provider Custom Resource Definition (CRD) managed by the Kagent operator.

apiVersion: kagent.io/v1alpha1
kind: Provider
metadata:
  name: openai-prod
spec:
  type: openai
  endpoint: https://api.openai.com
  auth:
    secretRef:
      name: openai-secret

Key Characteristics:

  • Declarative, typed, and validated
  • Namespace-scoped by default (cluster-scoped optional)
  • RBAC-controlled access
  • Best fit for multi-provider, multi-tenant environments

2. Provider Selection by Users

When creating a Model, users explicitly select a provider by name.

apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
  name: gpt-4o
spec:
  providerRef: # can be provider or providerRef
    name: openai-prod 
  model: gpt-4o

Behavior:

  • Only Ready providers are selectable
  • RBAC determines which providers a user can reference

3. Model Discovery per Provider

Operator Responsibilities

  • Configure providers and credentials
  • Enable model discovery in the Provider spec

Kagent Responsibilities

  • Watch Provider resources
  • Query provider APIs (periodic or on-demand)
  • Cache discovered models per provider

UX Flow

  1. User starts Model creation
  2. User selects a Provider
  3. Kagent fetches discovered models for that provider
  4. UI/CLI displays all available models

Validation:

  • Selected model must exist in the discovered model list
  • Manual overrides allowed only if explicitly configured

4. Status & Observability

Provider status reflects readiness and discovery health:

status:
  phase: Ready
  discoveredModels: 12
  conditions:
    - type: DiscoverySuccessful
      status: "True"

Failure scenarios (auth errors, API timeouts) surface clearly via status and events.

UI Mockup
┌─────────────────────────────────────────────┐
│ Name: [openai-gpt-4o ] [✏️] │
│ Namespace: [default ▼] │
│ Provider: [🤖 OpenAI ▼] [Fetch Models] │
│ Model: [gpt-4o ▼] │
└─────────────────────────────────────────────┘

🔄 Alternatives Considered

No response

🎯 Affected Service(s)

UI Service

📚 Additional Context

  • Aligns with Kubernetes-native design patterns
  • Enables future extensions:
    • Per-provider quotas
    • Cost attribution
    • Routing and policy enforcement

🙋 Are you willing to contribute?

  • I am willing to submit a PR for this feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions