diff --git a/README.md b/README.md
index 2c27108..def95c0 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@
 
 **Example applications for [dstack](https://github.com/Dstack-TEE/dstack) - Deploy containerized apps to TEEs with end-to-end security in minutes**
 
-[Getting Started](#getting-started) • [Use Cases](#use-cases) • [Core Patterns](#core-patterns) • [Dev Tools](#dev-scaffolding) • [Starter Packs](#starter-packs) • [Other Use Cases](#other-use-cases)
+[Getting Started](#getting-started) • [Confidential AI](#confidential-ai) • [Tutorials](#tutorials) • [Use Cases](#use-cases) • [Core Patterns](#core-patterns) • [Dev Tools](#dev-scaffolding) • [Starter Packs](#starter-packs)
 
 </div>
 
@@ -44,7 +44,7 @@ phala simulator start
 ### Run an Example Locally
 
 ```bash
-cd tutorial/01-attestation-oracle
+cd tutorial/01-attestation
 docker compose run --rm \
   -v ~/.phala-cloud/simulator/0.5.3/dstack.sock:/var/run/dstack.sock \
   app
@@ -57,7 +57,23 @@ phala auth login
 phala deploy -n my-app -c docker-compose.yaml
 ```
 
-See [Phala Cloud](https://cloud.phala.network) for production TEE deployment.
+See [Phala Cloud](https://cloud.phala.com) for production TEE deployment.
+
+---
+
+## Confidential AI
+
+Run AI workloads where prompts, model weights, and inference stay encrypted in hardware.
+
+| Example | Description |
+|---------|-------------|
+| [confidential-ai/inference](./confidential-ai/inference) | Private LLM inference with vLLM on Confidential GPU |
+| [confidential-ai/training](./confidential-ai/training) | Confidential fine-tuning on sensitive data using Unsloth |
+| [confidential-ai/agents](./confidential-ai/agents) | Secure AI agent with TEE-derived wallet keys using LangChain and Confidential AI models |
+
+GPU deployments require: `--instance-type h200.small --region US-EAST-1 --image dstack-nvidia-dev-0.5.4.1`
+
+See [Confidential AI Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/confidential-ai.md) for concepts and security model.
 
 ---
 
@@ -67,10 +83,10 @@ Step-by-step guides covering core dstack concepts.
 
 | Tutorial | Description |
 |----------|-------------|
-| [01-attestation-oracle](./tutorial/01-attestation-oracle) | Use the guest SDK to work with attestations directly — build an oracle, bind data to TDX quotes via `report_data`, verify with local scripts |
-| [02-persistence-and-kms](./tutorial/02-persistence-and-kms) | Use `getKey()` for deterministic key derivation from a KMS — persistent wallets, same key across restarts |
-| [03-gateway-and-ingress](./tutorial/03-gateway-and-ingress) | Custom domains with automatic SSL, certificate evidence chain |
-| [04-upgrades](./tutorial/04-upgrades) | Extend `AppAuth.sol` with custom authorization logic — NFT-gated clusters, on-chain governance |
+| [01-attestation](./tutorial/01-attestation) | Build an oracle, bind data to TDX quotes via `report_data`, verify with local scripts |
+| [02-kms-and-signing](./tutorial/02-kms-and-signing) | Deterministic key derivation from KMS — persistent wallets, same key across restarts |
+| [03-gateway-and-tls](./tutorial/03-gateway-and-tls) | Custom domains with automatic SSL, certificate evidence chain |
+| [04-onchain-oracle](./tutorial/04-onchain-oracle) | AppAuth contracts, on-chain signature verification, multi-device deployment |
 
 ---
 
@@ -120,15 +136,6 @@ TLS termination, custom domains, external connectivity.
 | Example | Description |
 |---------|-------------|
 | [dstack-ingress](./custom-domain/dstack-ingress) | **Complete ingress solution** — auto SSL via Let's Encrypt, multi-domain, DNS validation, evidence generation with TDX quote chain |
-| [custom-domain](./custom-domain/custom-domain) | Simpler custom domain setup via zt-https |
-
-### Keys & Persistence
-
-Persistent keys across deployments via KMS.
-
-| Example | Description | Status |
-|---------|-------------|--------|
-| [get-key-basic](./get-key-basic) | `dstack.get_key()` — same key identity across machines | Coming Soon |
 
 ### On-Chain Interaction
 
diff --git a/confidential-ai/README.md b/confidential-ai/README.md
new file mode 100644
index 0000000..54befc8
--- /dev/null
+++ b/confidential-ai/README.md
@@ -0,0 +1,23 @@
+# Confidential AI Examples
+
+Run AI workloads with hardware-enforced privacy. Your prompts, model weights, and computations stay encrypted in memory.
+
+| Example | Description | Status |
+|---------|-------------|--------|
+| [inference](./inference) | Private LLM with response signing | Ready to deploy |
+| [training](./training) | Fine-tuning on sensitive data | Requires local build |
+| [agents](./agents) | AI agent with TEE-derived keys | Requires local build |
+
+Start with inference—it deploys in one command and shows the full attestation flow.
+
+```bash
+cd inference
+phala auth login
+phala deploy -n my-llm -c docker-compose.yaml \
+  --instance-type h200.small \
+  -e TOKEN=your-secret-token
+```
+
+First deployment takes 10-15 minutes (large images + model loading). Check progress with `phala cvms serial-logs <app_id> --tail 100`.
+
+See the [Confidential AI Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/confidential-ai.md) for how the security model works.
diff --git a/confidential-ai/agents/Dockerfile b/confidential-ai/agents/Dockerfile
new file mode 100644
index 0000000..a0a1df4
--- /dev/null
+++ b/confidential-ai/agents/Dockerfile
@@ -0,0 +1,12 @@
+FROM python:3.11-slim
+
+WORKDIR /app
+
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+COPY agent.py .
+
+EXPOSE 8080
+
+CMD ["python", "agent.py"]
diff --git a/confidential-ai/agents/README.md b/confidential-ai/agents/README.md
new file mode 100644
index 0000000..34015a8
--- /dev/null
+++ b/confidential-ai/agents/README.md
@@ -0,0 +1,91 @@
+# Secure AI Agent
+
+Run AI agents with TEE-derived wallet keys. The agent calls a confidential LLM (redpill.ai), so prompts never leave encrypted memory.
+
+## Quick Start
+
+```bash
+phala auth login
+phala deploy -n my-agent -c docker-compose.yaml \
+  -e LLM_API_KEY=your-redpill-key
+```
+
+Your API key is encrypted client-side and only decrypted inside the TEE.
+
+Test it:
+
+```bash
+# Get agent info and wallet address
+curl https://<endpoint>/
+
+# Chat with the agent
+curl -X POST https://<endpoint>/chat \
+  -H "Content-Type: application/json" \
+  -d '{"message": "What is your wallet address?"}'
+
+# Sign a message
+curl -X POST https://<endpoint>/sign \
+  -H "Content-Type: application/json" \
+  -d '{"message": "Hello from TEE"}'
+```
+
+## How It Works
+
+```mermaid
+graph TB
+    User -->|TLS| Agent
+    subgraph TEE1[Agent CVM]
+        Agent[Agent Code]
+        Agent --> Wallet[TEE-derived wallet]
+    end
+    Agent -->|TLS| LLM
+    subgraph TEE2[LLM CVM]
+        LLM[redpill.ai]
+    end
+```
+
+The agent derives an Ethereum wallet from TEE keys:
+
+```python
+from dstack_sdk import DstackClient
+from dstack_sdk.ethereum import to_account
+
+client = DstackClient()
+eth_key = client.get_key("agent/wallet", "mainnet")
+account = to_account(eth_key)
+# Same path = same key, even across restarts
+```
+
+Both the agent and the LLM run in separate TEEs. User queries stay encrypted from browser to agent to LLM and back.
+
+## API
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/` | GET | Agent info, wallet address, TCB info |
+| `/attestation` | GET | TEE attestation quote |
+| `/chat` | POST | Chat with the agent |
+| `/sign` | POST | Sign a message with agent's wallet |
+
+## Using a Different LLM
+
+The agent uses redpill.ai by default for end-to-end confidentiality. To use a different OpenAI-compatible endpoint:
+
+```bash
+phala deploy -n my-agent -c docker-compose.yaml \
+  -e LLM_BASE_URL=https://api.openai.com/v1 \
+  -e LLM_API_KEY=sk-xxxxx
+```
+
+Note: Using a non-confidential LLM means prompts leave the encrypted environment.
+
+## Cleanup
+
+```bash
+phala cvms delete my-agent --force
+```
+
+## Further Reading
+
+- [Confidential AI Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/confidential-ai.md)
+- [dstack Python SDK](https://github.com/Dstack-TEE/dstack/tree/master/sdk/python)
diff --git a/confidential-ai/agents/agent.py b/confidential-ai/agents/agent.py
new file mode 100644
index 0000000..0517312
--- /dev/null
+++ b/confidential-ai/agents/agent.py
@@ -0,0 +1,198 @@
+#!/usr/bin/env python3
+"""
+Secure AI Agent with TEE Key Derivation
+
+This agent demonstrates:
+- TEE-derived Ethereum wallet (deterministic, persistent)
+- Protected API credentials (encrypted at deploy)
+- Confidential LLM calls via redpill.ai
+- Attestation proof for execution verification
+"""
+
+import os
+
+from dstack_sdk import DstackClient
+from dstack_sdk.ethereum import to_account
+from eth_account.messages import encode_defunct
+from flask import Flask, jsonify, request
+from langchain_classic.agents import AgentExecutor, create_react_agent
+from langchain_classic.tools import Tool
+from langchain_openai import ChatOpenAI
+from langchain_core.prompts import PromptTemplate
+
+app = Flask(__name__)
+
+# Lazy initialization - only connect when needed
+_client = None
+_account = None
+
+
+def get_client():
+    """Get dstack client (lazy initialization)."""
+    global _client
+    if _client is None:
+        _client = DstackClient()
+    return _client
+
+
+def get_account():
+    """Get Ethereum account (lazy initialization)."""
+    global _account
+    if _account is None:
+        client = get_client()
+        eth_key = client.get_key("agent/wallet", "mainnet")
+        _account = to_account(eth_key)
+        print(f"Agent wallet address: {_account.address}")
+    return _account
+
+
+def get_wallet_address(_: str = "") -> str:
+    """Get the agent's wallet address."""
+    return f"Agent wallet: {get_account().address}"
+
+
+def get_attestation(nonce: str = "default") -> str:
+    """Get TEE attestation quote."""
+    quote = get_client().get_quote(nonce.encode()[:64])
+    return f"TEE Quote (first 100 chars): {quote.quote[:100]}..."
+
+
+def sign_message(message: str) -> str:
+    """Sign a message with the agent's wallet."""
+    signable = encode_defunct(text=message)
+    signed = get_account().sign_message(signable)
+    return f"Signature: {signed.signature.hex()}"
+
+
+# Define agent tools
+tools = [
+    Tool(
+        name="GetWallet",
+        func=get_wallet_address,
+        description="Get the agent's Ethereum wallet address",
+    ),
+    Tool(
+        name="GetAttestation",
+        func=get_attestation,
+        description="Get TEE attestation quote to prove secure execution",
+    ),
+    Tool(
+        name="SignMessage",
+        func=sign_message,
+        description="Sign a message with the agent's wallet. Input: the message to sign",
+    ),
+]
+
+# LangChain agent (lazy initialization)
+_agent_executor = None
+
+
+def get_agent_executor():
+    """Get LangChain agent executor (lazy initialization)."""
+    global _agent_executor
+    if _agent_executor is None:
+        template = """You are a secure AI agent running in a Trusted Execution Environment (TEE).
+You have access to a deterministic Ethereum wallet derived from TEE keys.
+Your wallet address and signing capabilities are protected by hardware.
+
+You have access to the following tools:
+{tools}
+
+Use the following format:
+Question: the input question
+Thought: think about what to do
+Action: the action to take, should be one of [{tool_names}]
+Action Input: the input to the action
+Observation: the result of the action
+... (repeat Thought/Action/Action Input/Observation as needed)
+Thought: I now know the final answer
+Final Answer: the final answer
+
+Question: {input}
+{agent_scratchpad}"""
+
+        prompt = PromptTemplate.from_template(template)
+
+        # Use redpill.ai for confidential LLM calls (OpenAI-compatible API)
+        llm = ChatOpenAI(
+            model=os.environ.get("LLM_MODEL", "openai/gpt-4o-mini"),
+            base_url=os.environ.get("LLM_BASE_URL", "https://api.redpill.ai/v1"),
+            api_key=os.environ.get("LLM_API_KEY", ""),
+            temperature=0,
+        )
+
+        agent = create_react_agent(llm, tools, prompt)
+        _agent_executor = AgentExecutor(
+            agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
+        )
+    return _agent_executor
+
+
+@app.route("/")
+def index():
+    """Agent info endpoint."""
+    try:
+        info = get_client().info()
+        return jsonify(
+            {
+                "status": "running",
+                "wallet": get_account().address,
+                "app_id": info.app_id,
+            }
+        )
+    except Exception as e:
+        return jsonify({"status": "running", "error": str(e)})
+
+
+@app.route("/attestation")
+def attestation():
+    """Get TEE attestation."""
+    nonce = request.args.get("nonce", "default")
+    quote = get_client().get_quote(nonce.encode()[:64])
+    return jsonify({"quote": quote.quote, "nonce": nonce})
+
+
+@app.route("/chat", methods=["POST"])
+def chat():
+    """Chat with the agent."""
+    data = request.get_json()
+    message = data.get("message", "")
+
+    if not message:
+        return jsonify({"error": "No message provided"}), 400
+
+    try:
+        result = get_agent_executor().invoke({"input": message})
+        return jsonify(
+            {
+                "response": result["output"],
+                "wallet": get_account().address,
+            }
+        )
+    except Exception as e:
+        return jsonify({"error": str(e)}), 500
+
+
+@app.route("/sign", methods=["POST"])
+def sign():
+    """Sign a message with the agent's wallet."""
+    data = request.get_json()
+    message = data.get("message", "")
+
+    if not message:
+        return jsonify({"error": "No message provided"}), 400
+
+    signable = encode_defunct(text=message)
+    signed = get_account().sign_message(signable)
+    return jsonify(
+        {
+            "message": message,
+            "signature": signed.signature.hex(),
+            "signer": get_account().address,
+        }
+    )
+
+
+if __name__ == "__main__":
+    print("Starting agent server...")
+    app.run(host="0.0.0.0", port=8080)
diff --git a/confidential-ai/agents/docker-compose.yaml b/confidential-ai/agents/docker-compose.yaml
new file mode 100644
index 0000000..a935fde
--- /dev/null
+++ b/confidential-ai/agents/docker-compose.yaml
@@ -0,0 +1,19 @@
+# Secure AI Agent with TEE Key Derivation
+#
+# Runs an AI agent with:
+# - TEE-derived wallet keys (deterministic, never leave TEE)
+# - Encrypted API credentials
+# - Confidential LLM calls via redpill.ai
+#
+# Deploy: phala deploy -n my-agent -c docker-compose.yaml -e LLM_API_KEY=your-key
+
+services:
+  agent:
+    image: h4x3rotab/tee-agent:v0.4
+    volumes:
+      - /var/run/dstack.sock:/var/run/dstack.sock
+    environment:
+      - LLM_BASE_URL=https://api.redpill.ai/v1
+      - LLM_API_KEY  # Encrypted at deploy time
+    ports:
+      - "8080:8080"
diff --git a/confidential-ai/agents/requirements.txt b/confidential-ai/agents/requirements.txt
new file mode 100644
index 0000000..fcb49e2
--- /dev/null
+++ b/confidential-ai/agents/requirements.txt
@@ -0,0 +1,6 @@
+dstack-sdk>=0.1.0
+langchain-classic>=1.0.0
+langchain-openai>=0.0.5
+langchain-community>=0.4.0
+web3>=6.0.0
+flask>=3.0.0
diff --git a/confidential-ai/agents/test.sh b/confidential-ai/agents/test.sh
new file mode 100755
index 0000000..596f315
--- /dev/null
+++ b/confidential-ai/agents/test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+# Test the AI agent
+#
+# Prerequisites:
+# - docker compose up (running in background)
+
+set -e
+
+BASE_URL="${BASE_URL:-http://localhost:8080}"
+
+echo "Testing AI Agent at $BASE_URL"
+echo "=============================="
+
+# Wait for service to be ready
+echo "Waiting for service..."
+for i in {1..30}; do
+    if curl -s "$BASE_URL/" > /dev/null 2>&1; then
+        echo "Service ready"
+        break
+    fi
+    sleep 2
+done
+
+# Test info endpoint
+echo -e "\n1. Testing info endpoint..."
+INFO=$(curl -s "$BASE_URL/")
+echo "Wallet: $(echo "$INFO" | jq -r '.wallet')"
+echo "App ID: $(echo "$INFO" | jq -r '.app_id')"
+
+# Test attestation
+echo -e "\n2. Testing attestation..."
+ATTESTATION=$(curl -s "$BASE_URL/attestation?nonce=test-123")
+QUOTE=$(echo "$ATTESTATION" | jq -r '.quote')
+echo "TEE Quote (first 100 chars): ${QUOTE:0:100}..."
+
+# Test signing
+echo -e "\n3. Testing message signing..."
+SIG=$(curl -s -X POST "$BASE_URL/sign" \
+    -H "Content-Type: application/json" \
+    -d '{"message": "Hello from TEE"}')
+echo "Signer: $(echo "$SIG" | jq -r '.signer')"
+echo "Signature: $(echo "$SIG" | jq -r '.signature' | head -c 66)..."
+
+# Test chat (requires OPENAI_API_KEY)
+if [ -n "$OPENAI_API_KEY" ]; then
+    echo -e "\n4. Testing chat..."
+    CHAT=$(curl -s -X POST "$BASE_URL/chat" \
+        -H "Content-Type: application/json" \
+        -d '{"message": "What is your wallet address?"}')
+    echo "Response: $(echo "$CHAT" | jq -r '.response')"
+else
+    echo -e "\n4. Skipping chat test (OPENAI_API_KEY not set)"
+fi
+
+echo -e "\n=============================="
+echo "Tests completed!"
+echo "Verify attestation at: https://proof.t16z.com"
diff --git a/confidential-ai/inference/README.md b/confidential-ai/inference/README.md
new file mode 100644
index 0000000..d7170ed
--- /dev/null
+++ b/confidential-ai/inference/README.md
@@ -0,0 +1,110 @@
+# Private LLM Inference
+
+Deploy an OpenAI-compatible LLM endpoint where responses are signed by TEE-derived keys. Clients can verify responses came from genuine confidential hardware.
+
+## Quick Start
+
+```bash
+phala auth login
+phala deploy -n my-inference -c docker-compose.yaml \
+  --instance-type h200.small \
+  -e TOKEN=your-secret-token
+```
+
+The `-e` flag encrypts your token client-side—it only gets decrypted inside the TEE.
+
+First deployment takes 10-15 minutes (5GB+ image, model loading). Watch progress:
+
+```bash
+phala cvms serial-logs <app_id> --tail 100
+```
+
+Test it:
+
+```bash
+curl -X POST https://<endpoint>/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer your-secret-token" \
+  -d '{"model": "Qwen/Qwen2.5-1.5B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}'
+```
+
+## How It Works
+
+```mermaid
+graph LR
+    Client -->|TLS| Proxy[vllm-proxy :8000]
+    subgraph TEE[Confidential VM]
+        Proxy -->|HTTP| vLLM
+        Proxy -.->|signs with| Key[TEE-derived key]
+    end
+    Proxy -->|signed response| Client
+```
+
+The proxy authenticates requests, forwards to vLLM, and signs responses with a key derived from the TEE's identity. That signature proves the response came from this specific confidential environment.
+
+## API
+
+OpenAI-compatible. The standard client works directly:
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="https://<endpoint>/v1",
+    api_key="your-secret-token"
+)
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen2.5-1.5B-Instruct",
+    messages=[{"role": "user", "content": "Hello!"}]
+)
+```
+
+Additional endpoints:
+
+```bash
+# Get signature for a response (use chat_id from response)
+curl https://<endpoint>/v1/signature/<chat_id> -H "Authorization: Bearer $TOKEN"
+
+# Get attestation report
+curl https://<endpoint>/v1/attestation/report -H "Authorization: Bearer $TOKEN"
+```
+
+## Using Different Models
+
+Update both the vLLM command and proxy config in docker-compose.yaml:
+
+```yaml
+# vLLM service
+command: --model meta-llama/Llama-3.2-3B-Instruct ...
+
+# Proxy service
+environment:
+  - MODEL_NAME=meta-llama/Llama-3.2-3B-Instruct
+```
+
+For gated models, add your Hugging Face token:
+
+```bash
+phala deploy -n my-inference -c docker-compose.yaml \
+  --instance-type h200.small \
+  -e TOKEN=your-secret-token \
+  -e HF_TOKEN=hf_xxxxx
+```
+
+## Updating Secrets
+
+Change tokens without full redeployment:
+
+```bash
+phala deploy --cvm-id my-inference -c docker-compose.yaml \
+  -e TOKEN=new-secret-token
+```
+
+Old tokens stop working immediately.
+
+## Cleanup
+
+```bash
+phala cvms delete my-inference --force
+```
diff --git a/confidential-ai/inference/docker-compose.yaml b/confidential-ai/inference/docker-compose.yaml
new file mode 100644
index 0000000..0ccaef9
--- /dev/null
+++ b/confidential-ai/inference/docker-compose.yaml
@@ -0,0 +1,42 @@
+# Private LLM Inference in TEE
+#
+# Runs vLLM with vllm-proxy for response signing and attestation.
+# Uses a small model (Qwen 1.5B) for quick testing.
+#
+# Deploy with encrypted secrets:
+#   phala deploy -n my-inference -c docker-compose.yaml \
+#     --instance-type h200.small \
+#     -e TOKEN=your-secret-token
+
+services:
+  vllm:
+    image: vllm/vllm-openai:latest
+    environment:
+      - NVIDIA_VISIBLE_DEVICES=all
+      - HF_TOKEN=${HF_TOKEN:-}
+    command: >
+      --model Qwen/Qwen2.5-1.5B-Instruct
+      --host 0.0.0.0
+      --port 8000
+      --max-model-len 4096
+      --gpu-memory-utilization 0.8
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+
+  proxy:
+    image: phalanetwork/vllm-proxy:v0.2.18
+    volumes:
+      - /var/run/dstack.sock:/var/run/dstack.sock
+    environment:
+      - VLLM_BASE_URL=http://vllm:8000
+      - MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct
+      - TOKEN=${TOKEN}
+    ports:
+      - "8000:8000"
+    depends_on:
+      - vllm
diff --git a/confidential-ai/inference/requirements.txt b/confidential-ai/inference/requirements.txt
new file mode 100644
index 0000000..a8608b2
--- /dev/null
+++ b/confidential-ai/inference/requirements.txt
@@ -0,0 +1 @@
+requests>=2.28.0
diff --git a/confidential-ai/inference/test.sh b/confidential-ai/inference/test.sh
new file mode 100755
index 0000000..3b0f649
--- /dev/null
+++ b/confidential-ai/inference/test.sh
@@ -0,0 +1,55 @@
+#!/bin/bash
+# Test vllm-proxy inference and attestation
+#
+# Prerequisites:
+# - docker compose up (running in background)
+# - pip install requests
+
+set -e
+
+BASE_URL="${BASE_URL:-http://localhost:8000}"
+
+echo "Testing vllm-proxy at $BASE_URL"
+echo "================================"
+
+# Wait for service to be ready
+echo "Waiting for service..."
+for i in {1..60}; do
+    if curl -s "$BASE_URL/health" > /dev/null 2>&1; then
+        echo "Service ready"
+        break
+    fi
+    sleep 2
+done
+
+# Test chat completion
+echo -e "\n1. Testing chat completion..."
+RESPONSE=$(curl -s "$BASE_URL/v1/chat/completions" \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "Qwen/Qwen2.5-1.5B-Instruct",
+        "messages": [{"role": "user", "content": "Say hello in exactly 5 words"}],
+        "max_tokens": 50
+    }')
+
+CHAT_ID=$(echo "$RESPONSE" | jq -r '.id')
+CONTENT=$(echo "$RESPONSE" | jq -r '.choices[0].message.content')
+
+echo "Chat ID: $CHAT_ID"
+echo "Response: $CONTENT"
+
+# Test signature retrieval
+echo -e "\n2. Testing signature retrieval..."
+SIG=$(curl -s "$BASE_URL/v1/signature/$CHAT_ID")
+echo "Signing address: $(echo "$SIG" | jq -r '.signing_address')"
+echo "Response hash: $(echo "$SIG" | jq -r '.response_hash')"
+
+# Test attestation
+echo -e "\n3. Testing attestation..."
+ATTESTATION=$(curl -s "$BASE_URL/v1/attestation?nonce=test-nonce")
+QUOTE=$(echo "$ATTESTATION" | jq -r '.quote')
+echo "TEE Quote (first 100 chars): ${QUOTE:0:100}..."
+
+echo -e "\n================================"
+echo "All tests passed!"
+echo "Verify attestation at: https://proof.t16z.com"
diff --git a/confidential-ai/inference/verify.py b/confidential-ai/inference/verify.py
new file mode 100644
index 0000000..1aed227
--- /dev/null
+++ b/confidential-ai/inference/verify.py
@@ -0,0 +1,120 @@
+#!/usr/bin/env python3
+"""
+Verify LLM responses from vllm-proxy.
+
+This script demonstrates how to:
+1. Send chat completions to the TEE-hosted LLM
+2. Retrieve and verify response signatures
+3. Fetch TEE attestation quotes
+"""
+
+import argparse
+import hashlib
+import json
+import sys
+
+import requests
+
+
+def chat_completion(base_url: str, message: str) -> dict:
+    """Send a chat completion request."""
+    response = requests.post(
+        f"{base_url}/v1/chat/completions",
+        json={
+            "model": "Qwen/Qwen2.5-1.5B-Instruct",
+            "messages": [{"role": "user", "content": message}],
+            "max_tokens": 256,
+        },
+        headers={"Content-Type": "application/json"},
+    )
+    response.raise_for_status()
+    return response.json()
+
+
+def get_signature(base_url: str, chat_id: str) -> dict:
+    """Retrieve the signature for a chat completion."""
+    response = requests.get(f"{base_url}/v1/signature/{chat_id}")
+    response.raise_for_status()
+    return response.json()
+
+
+def get_attestation(base_url: str, nonce: str = "verification-nonce") -> dict:
+    """Fetch TEE attestation quote."""
+    response = requests.get(f"{base_url}/v1/attestation", params={"nonce": nonce})
+    response.raise_for_status()
+    return response.json()
+
+
+def verify_response_hash(response_content: str, expected_hash: str) -> bool:
+    """Verify response content matches the signed hash."""
+    computed = hashlib.sha256(response_content.encode()).hexdigest()
+    return computed == expected_hash
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Verify vllm-proxy responses")
+    parser.add_argument(
+        "--url",
+        default="http://localhost:8000",
+        help="vllm-proxy URL (default: http://localhost:8000)",
+    )
+    parser.add_argument(
+        "--message",
+        default="What is confidential computing?",
+        help="Message to send",
+    )
+    parser.add_argument(
+        "--attestation-only",
+        action="store_true",
+        help="Only fetch attestation, skip chat",
+    )
+    args = parser.parse_args()
+
+    if args.attestation_only:
+        print("Fetching attestation...")
+        attestation = get_attestation(args.url)
+        print(f"\nTEE Quote (first 200 chars):\n{attestation.get('quote', '')[:200]}...")
+        if attestation.get("gpu_evidence"):
+            print(f"\nGPU Evidence: {attestation['gpu_evidence'][:100]}...")
+        print("\nVerify at: https://proof.t16z.com")
+        return
+
+    # Send chat completion
+    print(f"Sending message: {args.message}")
+    completion = chat_completion(args.url, args.message)
+
+    chat_id = completion["id"]
+    response_text = completion["choices"][0]["message"]["content"]
+
+    print(f"\nResponse:\n{response_text}")
+    print(f"\nChat ID: {chat_id}")
+
+    # Get signature
+    print("\nFetching signature...")
+    sig = get_signature(args.url, chat_id)
+
+    print(f"Request hash:  {sig.get('request_hash', 'N/A')}")
+    print(f"Response hash: {sig.get('response_hash', 'N/A')}")
+    print(f"ECDSA sig:     {sig.get('ecdsa_signature', 'N/A')[:64]}...")
+    print(f"Signing addr:  {sig.get('signing_address', 'N/A')}")
+
+    # Get attestation
+    print("\nFetching attestation...")
+    attestation = get_attestation(args.url)
+
+    quote = attestation.get("quote", "")
+    print(f"TEE Quote:     {quote[:64]}..." if quote else "No quote available")
+
+    # Summary
+    print("\n" + "=" * 60)
+    print("VERIFICATION CHECKLIST")
+    print("=" * 60)
+    print("1. [  ] TEE quote is valid (verify at proof.t16z.com)")
+    print("2. [  ] Signing address in quote matches response signer")
+    print("3. [  ] Response hash matches actual response content")
+    print(f"\nPaste this quote at https://proof.t16z.com for verification:")
+    print(quote[:200] + "..." if len(quote) > 200 else quote)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/confidential-ai/training/README.md b/confidential-ai/training/README.md
new file mode 100644
index 0000000..7ddd2bf
--- /dev/null
+++ b/confidential-ai/training/README.md
@@ -0,0 +1,45 @@
+# Confidential Training
+
+Fine-tune LLMs on sensitive data using JupyterLab inside a TEE. Upload your dataset through the browser, run training cells, download your model.
+
+## Quick Start
+
+```bash
+phala auth login
+phala deploy -n my-training -c docker-compose.yaml \
+  --instance-type h200.small \
+  -e HF_TOKEN=hf_xxxxx \
+  -e JUPYTER_PASSWORD=your-secret
+```
+
+Open `https://<endpoint>:8888` and log in with your password.
+
+## Workflow
+
+```mermaid
+graph LR
+    You -->|upload data| Jupyter
+    subgraph TEE[Confidential VM]
+        Jupyter[JupyterLab]
+        Jupyter --> GPU[Confidential GPU]
+    end
+    Jupyter -->|download model| You
+```
+
+1. Use the pre-installed notebooks in `/workspace/unsloth-notebooks/` or upload `finetune.ipynb` from this repo
+2. Upload your training data (JSONL with `instruction` and `response` fields)
+3. Run the cells to train
+4. Download the output folder with your fine-tuned weights
+
+Your data and model weights stay encrypted in GPU memory throughout training.
+
+## Cleanup
+
+```bash
+phala cvms delete my-training --force
+```
+
+## Further Reading
+
+- [Confidential AI Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/confidential-ai.md)
+- [Unsloth](https://github.com/unslothai/unsloth) for the training library
diff --git a/confidential-ai/training/docker-compose.yaml b/confidential-ai/training/docker-compose.yaml
new file mode 100644
index 0000000..67ce674
--- /dev/null
+++ b/confidential-ai/training/docker-compose.yaml
@@ -0,0 +1,26 @@
+# Confidential Fine-tuning with Jupyter
+#
+# Run JupyterLab inside a TEE for interactive training.
+# Upload data via browser, run training cells, download results.
+#
+# Deploy: phala deploy -n my-training -c docker-compose.yaml \
+#   --instance-type h200.small -e HF_TOKEN=hf_xxx -e JUPYTER_PASSWORD=your-secret
+
+services:
+  jupyter:
+    image: unsloth/unsloth:latest
+    volumes:
+      - /var/run/dstack.sock:/var/run/dstack.sock
+    environment:
+      - NVIDIA_VISIBLE_DEVICES=all
+      - HF_TOKEN
+      - JUPYTER_PASSWORD
+    ports:
+      - "8888:8888"
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
diff --git a/confidential-ai/training/finetune.ipynb b/confidential-ai/training/finetune.ipynb
new file mode 100644
index 0000000..ea3d71d
--- /dev/null
+++ b/confidential-ai/training/finetune.ipynb
@@ -0,0 +1,162 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Confidential Fine-tuning\n",
+    "\n",
+    "Fine-tune an LLM on your data inside a TEE. Upload your dataset, run the cells, download your model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Configure\n",
+    "\n",
+    "Upload your training data using the file browser (left sidebar), then set the path below.\n",
+    "\n",
+    "Data format: JSONL with `instruction` and `response` fields:\n",
+    "```json\n",
+    "{\"instruction\": \"What is 2+2?\", \"response\": \"4\"}\n",
+    "{\"instruction\": \"Explain gravity\", \"response\": \"Gravity is...\"}\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "DATA_PATH = \"data.jsonl\"  # Path to your uploaded data\n",
+    "MODEL_NAME = \"unsloth/Llama-3.2-1B-Instruct\"\n",
+    "OUTPUT_DIR = \"output\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Load Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from unsloth import FastLanguageModel\n",
+    "\n",
+    "model, tokenizer = FastLanguageModel.from_pretrained(\n",
+    "    model_name=MODEL_NAME,\n",
+    "    max_seq_length=2048,\n",
+    "    load_in_4bit=True,\n",
+    ")\n",
+    "\n",
+    "model = FastLanguageModel.get_peft_model(\n",
+    "    model,\n",
+    "    r=16,\n",
+    "    lora_alpha=16,\n",
+    "    lora_dropout=0,\n",
+    "    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\"],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Load Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "from datasets import Dataset\n",
+    "\n",
+    "with open(DATA_PATH) as f:\n",
+    "    data = [json.loads(line) for line in f]\n",
+    "\n",
+    "formatted = [\n",
+    "    {\"text\": f\"### Instruction:\\n{item['instruction']}\\n\\n### Response:\\n{item['response']}\"}\n",
+    "    for item in data\n",
+    "]\n",
+    "\n",
+    "dataset = Dataset.from_list(formatted)\n",
+    "print(f\"Loaded {len(dataset)} examples\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Train"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import TrainingArguments\n",
+    "from trl import SFTTrainer\n",
+    "\n",
+    "trainer = SFTTrainer(\n",
+    "    model=model,\n",
+    "    train_dataset=dataset,\n",
+    "    dataset_text_field=\"text\",\n",
+    "    max_seq_length=2048,\n",
+    "    args=TrainingArguments(\n",
+    "        output_dir=OUTPUT_DIR,\n",
+    "        per_device_train_batch_size=2,\n",
+    "        gradient_accumulation_steps=4,\n",
+    "        num_train_epochs=1,\n",
+    "        learning_rate=2e-4,\n",
+    "        fp16=True,\n",
+    "        logging_steps=1,\n",
+    "        save_strategy=\"epoch\",\n",
+    "    ),\n",
+    "    tokenizer=tokenizer,\n",
+    ")\n",
+    "\n",
+    "trainer.train()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Save & Download"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.save_pretrained(OUTPUT_DIR)\n",
+    "tokenizer.save_pretrained(OUTPUT_DIR)\n",
+    "print(f\"Model saved to {OUTPUT_DIR}/\")\n",
+    "print(\"Use the file browser to download the output folder.\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}