[FEATURE] Provider abstraction with model discovery

### 📋 Prerequisites

- [x] I have searched the [existing issues](./issues) to avoid creating a duplicate
- [x] By submitting this issue, you agree to follow our [Code of Conduct](https://github.com/kagent-dev/kagent/blob/main/CODE_OF_CONDUCT.md)

### 📝 Feature Summary

Introduce a **Provider abstraction with model discovery** in Kagent, allowing operators to configure multiple providers and users to select a provider and discover available models dynamically during model creation.

### ❓ Problem Statement / Motivation

### Current Limitations / Pain Points

- Provider details are effectively static and managed through fixed configuration rather than being dynamically extensible.
- Users cannot choose between multiple providers (e.g., OpenAI, Bedrock, internal vLLM).
- Model creation today relies on a **fixed set of stock model options** exposed by the UI/CLI, rather than dynamically discovering models from the selected provider.

### Who Is Affected

- Platform / Infra teams managing Kagent as a shared service
- Application teams consuming models across different providers
- Operators who need RBAC, auditability, and safe multi-provider support

### Why This Is Needed

Without a provider abstraction:

- UX is poor and error-prone

### 💡 Proposed Solution

### Option 1: Default Provider via ConfigMap and Kagent refering default apikey for models

Allow operators to define a default provider using a ConfigMap variable, with credentials supplied to the Kagent controller pod via a mounted Secret or environment variables, enabling Kagent to authenticate and fetch available models for simpler or single-provider setups.

```yaml
# Example ConfigMap for configuring LLM providers with model discovery
#
# This ConfigMap defines the providers available for model discovery.
# The controller reads this ConfigMap and exposes the providers via the
# /api/providers/configured endpoint. Models can be discovered dynamically
# via /api/providers/configured/{name}/models.
#
# Prerequisites:
# - Create Secrets for each provider's API key (see examples below)
#
# Usage:
#   kubectl apply -f kagent-providers-configmap.yaml
#   kubectl apply -f provider-secrets.yaml  # Create secrets first
#
apiVersion: v1
kind: ConfigMap
metadata:
  name: kagent-providers
  namespace: kagent
  labels:
    app.kubernetes.io/name: kagent
    app.kubernetes.io/component: provider-config
data:
  # YAML list of provider configurations
  # Each provider has:
  #   - name: unique identifier for this provider instance
  #   - type: provider type (OpenAI, Anthropic, AzureOpenAI, Ollama, Gemini, etc.)
  #   - endpoint: base URL for the provider API
  #   - secretRef: reference to Kubernetes Secret containing API key
  providers: |
    - name: openai-prod
      type: OpenAI
      endpoint: https://api.openai.com/v1
      secretRef:
        name: openai-api-key
        key: apiKey

    - name: anthropic-prod
      type: Anthropic
      endpoint: https://api.anthropic.com
      secretRef:
        name: anthropic-api-key
        key: apiKey

    - name: azure-openai-eastus
      type: AzureOpenAI
      endpoint: https://my-resource.openai.azure.com
      secretRef:
        name: azure-openai-key
        key: apiKey

    - name: ollama-local
      type: Ollama
      endpoint: http://ollama.default.svc.cluster.local:11434
      secretRef:
        name: ollama-placeholder
        key: apiKey

    - name: litellm-gateway
      type: OpenAI
      endpoint: https://litellm.internal.company.com/v1
      secretRef:
        name: litellm-api-key
        key: apiKey
---
# Example Secrets for provider API keys
# IMPORTANT: In production, use sealed-secrets, external-secrets, or vault
# These examples use placeholder values - replace with real API keys
apiVersion: v1
kind: Secret
metadata:
  name: openai-api-key
  namespace: kagent
type: Opaque
stringData:
  apiKey: "sk-your-openai-api-key-here"
---
apiVersion: v1
kind: Secret
metadata:
  name: anthropic-api-key
  namespace: kagent
type: Opaque
stringData:
  apiKey: "sk-ant-your-anthropic-api-key-here"
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-openai-key
  namespace: kagent
type: Opaque
stringData:
  apiKey: "your-azure-openai-api-key-here"
---
apiVersion: v1
kind: Secret
metadata:
  name: ollama-placeholder
  namespace: kagent
type: Opaque
stringData:
  # Ollama doesn't require an API key, but we need a placeholder
  apiKey: "not-required"
---
apiVersion: v1
kind: Secret
metadata:
  name: litellm-api-key
  namespace: kagent
type: Opaque
stringData:
  apiKey: "your-litellm-master-key-here"

```

### Option 2. Provider as a First-Class Resource (Option B)

Introduce a **Provider Custom Resource Definition (CRD)** managed by the Kagent operator.

```yaml
apiVersion: kagent.io/v1alpha1
kind: Provider
metadata:
  name: openai-prod
spec:
  type: openai
  endpoint: https://api.openai.com
  auth:
    secretRef:
      name: openai-secret
```

**Key Characteristics**:

- Declarative, typed, and validated
- Namespace-scoped by default (cluster-scoped optional)
- RBAC-controlled access
- Best fit for multi-provider, multi-tenant environments


---

### 2. Provider Selection by Users

When creating a `Model`, users explicitly select a provider by name.

```yaml
apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
  name: gpt-4o
spec:
  providerRef: # can be provider or providerRef
    name: openai-prod 
  model: gpt-4o
```

**Behavior**:

- Only `Ready` providers are selectable
- RBAC determines which providers a user can reference

---

### 3. Model Discovery per Provider

#### Operator Responsibilities

- Configure providers and credentials
- Enable model discovery in the `Provider` spec

#### Kagent Responsibilities

- Watch `Provider` resources
- Query provider APIs (periodic or on-demand)
- Cache discovered models per provider

#### UX Flow

1. User starts Model creation
2. User selects a Provider
3. Kagent fetches discovered models for that provider
4. UI/CLI displays all available models

**Validation**:

- Selected model must exist in the discovered model list
- Manual overrides allowed only if explicitly configured

---

### 4. Status & Observability

`Provider` status reflects readiness and discovery health:

```yaml
status:
  phase: Ready
  discoveredModels: 12
  conditions:
    - type: DiscoverySuccessful
      status: "True"
```

Failure scenarios (auth errors, API timeouts) surface clearly via status and events.

UI Mockup
┌─────────────────────────────────────────────┐
│ Name: [openai-gpt-4o         ] [✏️]        │
│ Namespace: [default ▼]                      │
│ Provider: [🤖 OpenAI ▼] [Fetch Models]      │
│ Model: [gpt-4o ▼]                           │
└─────────────────────────────────────────────┘


### 🔄 Alternatives Considered

_No response_

### 🎯 Affected Service(s)

UI Service

### 📚 Additional Context

- Aligns with Kubernetes-native design patterns
- Enables future extensions:
  - Per-provider quotas
  - Cost attribution
  - Routing and policy enforcement

### 🙋 Are you willing to contribute?

- [x] I am willing to submit a PR for this feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Provider abstraction with model discovery #1249

📋 Prerequisites

📝 Feature Summary

❓ Problem Statement / Motivation

Current Limitations / Pain Points

Who Is Affected

Why This Is Needed

💡 Proposed Solution

Option 1: Default Provider via ConfigMap and Kagent refering default apikey for models

Option 2. Provider as a First-Class Resource (Option B)

2. Provider Selection by Users

3. Model Discovery per Provider

Operator Responsibilities

Kagent Responsibilities

UX Flow

4. Status & Observability

🔄 Alternatives Considered

🎯 Affected Service(s)

📚 Additional Context

🙋 Are you willing to contribute?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Provider abstraction with model discovery #1249

Description

📋 Prerequisites

📝 Feature Summary

❓ Problem Statement / Motivation

Current Limitations / Pain Points

Who Is Affected

Why This Is Needed

💡 Proposed Solution

Option 1: Default Provider via ConfigMap and Kagent refering default apikey for models

Option 2. Provider as a First-Class Resource (Option B)

2. Provider Selection by Users

3. Model Discovery per Provider

Operator Responsibilities

Kagent Responsibilities

UX Flow

4. Status & Observability

🔄 Alternatives Considered

🎯 Affected Service(s)

📚 Additional Context

🙋 Are you willing to contribute?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions