From f5b05c91fc58ffb8b3c03231f5308300cc5de529 Mon Sep 17 00:00:00 2001 From: Nikita Agarwal Date: Thu, 22 Jan 2026 14:49:39 +0530 Subject: [PATCH] Refactor Dockerfile to simplify file copying process and update README.md to reflect the transition of the AIRE Standards status from Draft to Live. Enhanced clarity in the Operational Excellence section and updated repository structure details. Added new sponsor image for branding consistency. --- Dockerfile | 4 +- README.md | 50 +++++++++++------- .../assets}/sponsors/exosphere.png | Bin docs/index.md | 19 +++---- 4 files changed, 42 insertions(+), 31 deletions(-) rename {assets => docs/assets}/sponsors/exosphere.png (100%) diff --git a/Dockerfile b/Dockerfile index bc57d11..890c5fb 100644 --- a/Dockerfile +++ b/Dockerfile @@ -20,9 +20,7 @@ COPY pyproject.toml uv.lock ./ RUN uv sync --frozen --no-dev # Copy project files -RUN mkdir -p docs -COPY mkdocs.yml ./ -COPY . ./docs/ +COPY . . # Build the MkDocs site RUN uv run mkdocs build --strict --site-dir /app/site diff --git a/README.md b/README.md index c742fa6..98a13fd 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # The AI Reliability Engineering (AIRE) Standards [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) -[![Status: Draft](https://img.shields.io/badge/Status-Draft%20v0.1-orange)]() +[![Status: Live](https://img.shields.io/badge/Status-Live%20v0.1-green)](https://github.com/exospherehost/ai-reliability-standards) > **An open implementation guide for building reliable AI Agents at scale. Defining the practices for AI Reliability Engineering (AIRE).** @@ -134,16 +134,16 @@ Security for AI agents differs from traditional software-agents are autonomous d ### 5. Operational Excellence & Team Culture -*Establishing SLAs, error budgets, team structures, and operational practices that enable reliable AI systems to scale.* +*Establishing performance targets, quality budgets, team structures, and operational practices that enable reliable AI systems to scale.* Operational Excellence bridges the gap between technical architecture and organizational culture. While the first four pillars define *what* to build, this pillar defines *how* teams operate, measure, and continuously improve AI systems at scale: -- **AI-Specific SLAs & Error Budgets** - Service Level Objectives for availability, latency, quality, safety, and efficiency; error budget policies for balancing reliability with innovation velocity +- **AI-Specific Performance Targets & Quality Budgets** - Performance targets for cognitive accuracy, safety integrity, autonomy level, response performance, and cost efficiency; quality budget policies for balancing reliability with innovation velocity - **Team Structure & Shared Responsibility** - Product teams own agents end-to-end; embedded AI Reliability Engineers (AIREs) with 20% time allocation; central platform team provides infrastructure - **Progressive Autonomy Maturity Model** - Five levels of agent autonomy (L0: Human-Driven → L4: Autonomous), reducing HITL rate from 100% to <5% over time -- **Reliability Reviews** - Weekly metric reviews, monthly postmortems, error budget tracking, SLO compliance monitoring +- **Reliability Reviews** - Weekly metric reviews, monthly postmortems, quality budget tracking, performance target compliance monitoring -**Key Metrics:** SLO Compliance >95%, Error Budget Remaining >25%, HITL Rate <10%, Autonomy Level L3+, Time to Autonomy <6 months +**Key Metrics:** Performance Target Compliance >95%, Quality Budget Remaining >50%, HITL Rate <10%, Autonomy Level L3+, Time to Autonomy <6 months 📖 **[Read the full Operational Excellence guide →](docs/pillars/operational-excellence.md)** @@ -187,19 +187,31 @@ You get to shape the future of AI reliability engineering and get recognized for ## Repository Structure -``` -docs/ -├── getting-started.md # Adoption roadmap for organizations -├── pillars/ -│ ├── resilient-architecture.md # Pillar 1: Fault tolerance, scaling, recovery -│ ├── cognitive-reliability.md # Pillar 2: Accuracy, consistency, drift detection -│ ├── quality-lifecycle.md # Pillar 3: Testing, deployment, feedback loops -│ ├── security.md # Pillar 4: JIT access, guardrails, audit logs -│ └── operational-excellence.md # Pillar 5: SLAs, team structure, progressive autonomy -└── appendix/ - ├── principles.md # AIRE Principles (5 guiding tenets) - ├── metrics-framework.md # Three-tier metrics framework - └── glossary.md # Key terms and definitions +This repository contains the source files for the AIRE Standards documentation and deployment infrastructure: + +```text +. +├── docs/ # MkDocs documentation source +│ ├── index.md # Documentation homepage +│ ├── getting-started.md # Adoption roadmap for organizations +│ ├── principles.md # AIRE Principles (5 guiding tenets) +│ ├── pillars/ # Core reliability pillars +│ │ ├── resilient-architecture.md # Pillar 1: Fault tolerance, scaling, recovery +│ │ ├── cognitive-reliability.md # Pillar 2: Accuracy, consistency, drift detection +│ │ ├── quality-lifecycle.md # Pillar 3: Testing, deployment, feedback loops +│ │ ├── security.md # Pillar 4: JIT access, guardrails, audit logs +│ │ └── operational-excellence.md # Pillar 5: Performance targets, team structure, progressive autonomy +│ └── appendix/ +│ ├── metrics-framework.md # Three-tier metrics framework +│ └── glossary.md # Key terms and definitions +├── assets/ # Static assets (sponsor logos, images) +├── k8s/ # Kubernetes deployment manifests +├── stylesheets/ # Custom CSS for documentation +├── mkdocs.yml # MkDocs configuration +├── Dockerfile # Container image for documentation site +├── pyproject.toml # Python project dependencies +├── README.md # GitHub repository homepage (this file) +├── CONTRIBUTORS.md # Contributors registry ``` --- @@ -215,7 +227,7 @@ We welcome Pull Requests (PRs) from engineers who have solved specific reliabili ## Sponsors -ExosphereHost Inc. +ExosphereHost Inc. Contact nikita@exosphere.host to sponsor this work. diff --git a/assets/sponsors/exosphere.png b/docs/assets/sponsors/exosphere.png similarity index 100% rename from assets/sponsors/exosphere.png rename to docs/assets/sponsors/exosphere.png diff --git a/docs/index.md b/docs/index.md index 36e5d93..1174876 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,7 +1,7 @@ # The AI Reliability Engineering (AIRE) Standards [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) -[![Status: Draft](https://img.shields.io/badge/Status-Draft%20v0.1-orange)]() +[![Status: Live](https://img.shields.io/badge/Status-Live%20v0.1-green)](https://github.com/exospherehost/ai-reliability-standards) > **An open implementation guide for building reliable AI Agents at scale. Defining the practices for AI Reliability Engineering (AIRE).** @@ -109,7 +109,6 @@ Operational Excellence bridges the gap between technical architecture and organi --- - ## AIRE Principles *Guiding tenets inspired by SRE:* @@ -150,7 +149,6 @@ Design for autonomous operation. Human escalation is a safety net for edge cases --- - ## Getting Started **New to AIRE?** Start with the **[Getting Started Guide →](getting-started.md)** for a step-by-step adoption roadmap: @@ -189,17 +187,20 @@ You get to shape the future of AI reliability engineering and get recognized for ## Repository Structure -``` -docs/ +This documentation is built from the [ai-reliability-standards repository](https://github.com/exospherehost/ai-reliability-standards). The repository structure includes: + +```text +docs/ # Documentation source files +├── index.md # This page (documentation homepage) ├── getting-started.md # Adoption roadmap for organizations -├── pillars/ +├── principles.md # AIRE Principles (5 guiding tenets) +├── pillars/ # Core reliability pillars │ ├── resilient-architecture.md # Pillar 1: Fault tolerance, scaling, recovery │ ├── cognitive-reliability.md # Pillar 2: Accuracy, consistency, drift detection │ ├── quality-lifecycle.md # Pillar 3: Testing, deployment, feedback loops │ ├── security.md # Pillar 4: JIT access, guardrails, audit logs -│ └── operational-excellence.md # Pillar 5: SLAs, team structure, progressive autonomy +│ └── operational-excellence.md # Pillar 5: Performance targets, team structure, progressive autonomy └── appendix/ - ├── principles.md # AIRE Principles (5 guiding tenets) ├── metrics-framework.md # Three-tier metrics framework └── glossary.md # Key terms and definitions ``` @@ -219,7 +220,7 @@ We welcome Pull Requests (PRs) from engineers who have solved specific reliabili ExosphereHost Inc. -Contact nivedit@exosphere.host to sponsor this work. +Contact nikita@exosphere.host to sponsor this work. ## License