diff --git a/README.md b/README.md index 2c27108..def95c0 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ **Example applications for [dstack](https://github.com/Dstack-TEE/dstack) - Deploy containerized apps to TEEs with end-to-end security in minutes** -[Getting Started](#getting-started) • [Use Cases](#use-cases) • [Core Patterns](#core-patterns) • [Dev Tools](#dev-scaffolding) • [Starter Packs](#starter-packs) • [Other Use Cases](#other-use-cases) +[Getting Started](#getting-started) • [Confidential AI](#confidential-ai) • [Tutorials](#tutorials) • [Use Cases](#use-cases) • [Core Patterns](#core-patterns) • [Dev Tools](#dev-scaffolding) • [Starter Packs](#starter-packs) @@ -44,7 +44,7 @@ phala simulator start ### Run an Example Locally ```bash -cd tutorial/01-attestation-oracle +cd tutorial/01-attestation docker compose run --rm \ -v ~/.phala-cloud/simulator/0.5.3/dstack.sock:/var/run/dstack.sock \ app @@ -57,7 +57,23 @@ phala auth login phala deploy -n my-app -c docker-compose.yaml ``` -See [Phala Cloud](https://cloud.phala.network) for production TEE deployment. +See [Phala Cloud](https://cloud.phala.com) for production TEE deployment. + +--- + +## Confidential AI + +Run AI workloads where prompts, model weights, and inference stay encrypted in hardware. + +| Example | Description | +|---------|-------------| +| [confidential-ai/inference](./confidential-ai/inference) | Private LLM inference with vLLM on Confidential GPU | +| [confidential-ai/training](./confidential-ai/training) | Confidential fine-tuning on sensitive data using Unsloth | +| [confidential-ai/agents](./confidential-ai/agents) | Secure AI agent with TEE-derived wallet keys using LangChain and Confidential AI models | + +GPU deployments require: `--instance-type h200.small --region US-EAST-1 --image dstack-nvidia-dev-0.5.4.1` + +See [Confidential AI Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/confidential-ai.md) for concepts and security model. --- @@ -67,10 +83,10 @@ Step-by-step guides covering core dstack concepts. | Tutorial | Description | |----------|-------------| -| [01-attestation-oracle](./tutorial/01-attestation-oracle) | Use the guest SDK to work with attestations directly — build an oracle, bind data to TDX quotes via `report_data`, verify with local scripts | -| [02-persistence-and-kms](./tutorial/02-persistence-and-kms) | Use `getKey()` for deterministic key derivation from a KMS — persistent wallets, same key across restarts | -| [03-gateway-and-ingress](./tutorial/03-gateway-and-ingress) | Custom domains with automatic SSL, certificate evidence chain | -| [04-upgrades](./tutorial/04-upgrades) | Extend `AppAuth.sol` with custom authorization logic — NFT-gated clusters, on-chain governance | +| [01-attestation](./tutorial/01-attestation) | Build an oracle, bind data to TDX quotes via `report_data`, verify with local scripts | +| [02-kms-and-signing](./tutorial/02-kms-and-signing) | Deterministic key derivation from KMS — persistent wallets, same key across restarts | +| [03-gateway-and-tls](./tutorial/03-gateway-and-tls) | Custom domains with automatic SSL, certificate evidence chain | +| [04-onchain-oracle](./tutorial/04-onchain-oracle) | AppAuth contracts, on-chain signature verification, multi-device deployment | --- @@ -120,15 +136,6 @@ TLS termination, custom domains, external connectivity. | Example | Description | |---------|-------------| | [dstack-ingress](./custom-domain/dstack-ingress) | **Complete ingress solution** — auto SSL via Let's Encrypt, multi-domain, DNS validation, evidence generation with TDX quote chain | -| [custom-domain](./custom-domain/custom-domain) | Simpler custom domain setup via zt-https | - -### Keys & Persistence - -Persistent keys across deployments via KMS. - -| Example | Description | Status | -|---------|-------------|--------| -| [get-key-basic](./get-key-basic) | `dstack.get_key()` — same key identity across machines | Coming Soon | ### On-Chain Interaction diff --git a/confidential-ai/README.md b/confidential-ai/README.md new file mode 100644 index 0000000..54befc8 --- /dev/null +++ b/confidential-ai/README.md @@ -0,0 +1,23 @@ +# Confidential AI Examples + +Run AI workloads with hardware-enforced privacy. Your prompts, model weights, and computations stay encrypted in memory. + +| Example | Description | Status | +|---------|-------------|--------| +| [inference](./inference) | Private LLM with response signing | Ready to deploy | +| [training](./training) | Fine-tuning on sensitive data | Requires local build | +| [agents](./agents) | AI agent with TEE-derived keys | Requires local build | + +Start with inference—it deploys in one command and shows the full attestation flow. + +```bash +cd inference +phala auth login +phala deploy -n my-llm -c docker-compose.yaml \ + --instance-type h200.small \ + -e TOKEN=your-secret-token +``` + +First deployment takes 10-15 minutes (large images + model loading). Check progress with `phala cvms serial-logs --tail 100`. + +See the [Confidential AI Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/confidential-ai.md) for how the security model works. diff --git a/confidential-ai/agents/Dockerfile b/confidential-ai/agents/Dockerfile new file mode 100644 index 0000000..a0a1df4 --- /dev/null +++ b/confidential-ai/agents/Dockerfile @@ -0,0 +1,12 @@ +FROM python:3.11-slim + +WORKDIR /app + +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +COPY agent.py . + +EXPOSE 8080 + +CMD ["python", "agent.py"] diff --git a/confidential-ai/agents/README.md b/confidential-ai/agents/README.md new file mode 100644 index 0000000..34015a8 --- /dev/null +++ b/confidential-ai/agents/README.md @@ -0,0 +1,91 @@ +# Secure AI Agent + +Run AI agents with TEE-derived wallet keys. The agent calls a confidential LLM (redpill.ai), so prompts never leave encrypted memory. + +## Quick Start + +```bash +phala auth login +phala deploy -n my-agent -c docker-compose.yaml \ + -e LLM_API_KEY=your-redpill-key +``` + +Your API key is encrypted client-side and only decrypted inside the TEE. + +Test it: + +```bash +# Get agent info and wallet address +curl https:/// + +# Chat with the agent +curl -X POST https:///chat \ + -H "Content-Type: application/json" \ + -d '{"message": "What is your wallet address?"}' + +# Sign a message +curl -X POST https:///sign \ + -H "Content-Type: application/json" \ + -d '{"message": "Hello from TEE"}' +``` + +## How It Works + +```mermaid +graph TB + User -->|TLS| Agent + subgraph TEE1[Agent CVM] + Agent[Agent Code] + Agent --> Wallet[TEE-derived wallet] + end + Agent -->|TLS| LLM + subgraph TEE2[LLM CVM] + LLM[redpill.ai] + end +``` + +The agent derives an Ethereum wallet from TEE keys: + +```python +from dstack_sdk import DstackClient +from dstack_sdk.ethereum import to_account + +client = DstackClient() +eth_key = client.get_key("agent/wallet", "mainnet") +account = to_account(eth_key) +# Same path = same key, even across restarts +``` + +Both the agent and the LLM run in separate TEEs. User queries stay encrypted from browser to agent to LLM and back. + +## API + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/` | GET | Agent info, wallet address, TCB info | +| `/attestation` | GET | TEE attestation quote | +| `/chat` | POST | Chat with the agent | +| `/sign` | POST | Sign a message with agent's wallet | + +## Using a Different LLM + +The agent uses redpill.ai by default for end-to-end confidentiality. To use a different OpenAI-compatible endpoint: + +```bash +phala deploy -n my-agent -c docker-compose.yaml \ + -e LLM_BASE_URL=https://api.openai.com/v1 \ + -e LLM_API_KEY=sk-xxxxx +``` + +Note: Using a non-confidential LLM means prompts leave the encrypted environment. + +## Cleanup + +```bash +phala cvms delete my-agent --force +``` + +## Further Reading + +- [Confidential AI Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/confidential-ai.md) +- [dstack Python SDK](https://github.com/Dstack-TEE/dstack/tree/master/sdk/python) diff --git a/confidential-ai/agents/agent.py b/confidential-ai/agents/agent.py new file mode 100644 index 0000000..0517312 --- /dev/null +++ b/confidential-ai/agents/agent.py @@ -0,0 +1,198 @@ +#!/usr/bin/env python3 +""" +Secure AI Agent with TEE Key Derivation + +This agent demonstrates: +- TEE-derived Ethereum wallet (deterministic, persistent) +- Protected API credentials (encrypted at deploy) +- Confidential LLM calls via redpill.ai +- Attestation proof for execution verification +""" + +import os + +from dstack_sdk import DstackClient +from dstack_sdk.ethereum import to_account +from eth_account.messages import encode_defunct +from flask import Flask, jsonify, request +from langchain_classic.agents import AgentExecutor, create_react_agent +from langchain_classic.tools import Tool +from langchain_openai import ChatOpenAI +from langchain_core.prompts import PromptTemplate + +app = Flask(__name__) + +# Lazy initialization - only connect when needed +_client = None +_account = None + + +def get_client(): + """Get dstack client (lazy initialization).""" + global _client + if _client is None: + _client = DstackClient() + return _client + + +def get_account(): + """Get Ethereum account (lazy initialization).""" + global _account + if _account is None: + client = get_client() + eth_key = client.get_key("agent/wallet", "mainnet") + _account = to_account(eth_key) + print(f"Agent wallet address: {_account.address}") + return _account + + +def get_wallet_address(_: str = "") -> str: + """Get the agent's wallet address.""" + return f"Agent wallet: {get_account().address}" + + +def get_attestation(nonce: str = "default") -> str: + """Get TEE attestation quote.""" + quote = get_client().get_quote(nonce.encode()[:64]) + return f"TEE Quote (first 100 chars): {quote.quote[:100]}..." + + +def sign_message(message: str) -> str: + """Sign a message with the agent's wallet.""" + signable = encode_defunct(text=message) + signed = get_account().sign_message(signable) + return f"Signature: {signed.signature.hex()}" + + +# Define agent tools +tools = [ + Tool( + name="GetWallet", + func=get_wallet_address, + description="Get the agent's Ethereum wallet address", + ), + Tool( + name="GetAttestation", + func=get_attestation, + description="Get TEE attestation quote to prove secure execution", + ), + Tool( + name="SignMessage", + func=sign_message, + description="Sign a message with the agent's wallet. Input: the message to sign", + ), +] + +# LangChain agent (lazy initialization) +_agent_executor = None + + +def get_agent_executor(): + """Get LangChain agent executor (lazy initialization).""" + global _agent_executor + if _agent_executor is None: + template = """You are a secure AI agent running in a Trusted Execution Environment (TEE). +You have access to a deterministic Ethereum wallet derived from TEE keys. +Your wallet address and signing capabilities are protected by hardware. + +You have access to the following tools: +{tools} + +Use the following format: +Question: the input question +Thought: think about what to do +Action: the action to take, should be one of [{tool_names}] +Action Input: the input to the action +Observation: the result of the action +... (repeat Thought/Action/Action Input/Observation as needed) +Thought: I now know the final answer +Final Answer: the final answer + +Question: {input} +{agent_scratchpad}""" + + prompt = PromptTemplate.from_template(template) + + # Use redpill.ai for confidential LLM calls (OpenAI-compatible API) + llm = ChatOpenAI( + model=os.environ.get("LLM_MODEL", "openai/gpt-4o-mini"), + base_url=os.environ.get("LLM_BASE_URL", "https://api.redpill.ai/v1"), + api_key=os.environ.get("LLM_API_KEY", ""), + temperature=0, + ) + + agent = create_react_agent(llm, tools, prompt) + _agent_executor = AgentExecutor( + agent=agent, tools=tools, verbose=True, handle_parsing_errors=True + ) + return _agent_executor + + +@app.route("/") +def index(): + """Agent info endpoint.""" + try: + info = get_client().info() + return jsonify( + { + "status": "running", + "wallet": get_account().address, + "app_id": info.app_id, + } + ) + except Exception as e: + return jsonify({"status": "running", "error": str(e)}) + + +@app.route("/attestation") +def attestation(): + """Get TEE attestation.""" + nonce = request.args.get("nonce", "default") + quote = get_client().get_quote(nonce.encode()[:64]) + return jsonify({"quote": quote.quote, "nonce": nonce}) + + +@app.route("/chat", methods=["POST"]) +def chat(): + """Chat with the agent.""" + data = request.get_json() + message = data.get("message", "") + + if not message: + return jsonify({"error": "No message provided"}), 400 + + try: + result = get_agent_executor().invoke({"input": message}) + return jsonify( + { + "response": result["output"], + "wallet": get_account().address, + } + ) + except Exception as e: + return jsonify({"error": str(e)}), 500 + + +@app.route("/sign", methods=["POST"]) +def sign(): + """Sign a message with the agent's wallet.""" + data = request.get_json() + message = data.get("message", "") + + if not message: + return jsonify({"error": "No message provided"}), 400 + + signable = encode_defunct(text=message) + signed = get_account().sign_message(signable) + return jsonify( + { + "message": message, + "signature": signed.signature.hex(), + "signer": get_account().address, + } + ) + + +if __name__ == "__main__": + print("Starting agent server...") + app.run(host="0.0.0.0", port=8080) diff --git a/confidential-ai/agents/docker-compose.yaml b/confidential-ai/agents/docker-compose.yaml new file mode 100644 index 0000000..a935fde --- /dev/null +++ b/confidential-ai/agents/docker-compose.yaml @@ -0,0 +1,19 @@ +# Secure AI Agent with TEE Key Derivation +# +# Runs an AI agent with: +# - TEE-derived wallet keys (deterministic, never leave TEE) +# - Encrypted API credentials +# - Confidential LLM calls via redpill.ai +# +# Deploy: phala deploy -n my-agent -c docker-compose.yaml -e LLM_API_KEY=your-key + +services: + agent: + image: h4x3rotab/tee-agent:v0.4 + volumes: + - /var/run/dstack.sock:/var/run/dstack.sock + environment: + - LLM_BASE_URL=https://api.redpill.ai/v1 + - LLM_API_KEY # Encrypted at deploy time + ports: + - "8080:8080" diff --git a/confidential-ai/agents/requirements.txt b/confidential-ai/agents/requirements.txt new file mode 100644 index 0000000..fcb49e2 --- /dev/null +++ b/confidential-ai/agents/requirements.txt @@ -0,0 +1,6 @@ +dstack-sdk>=0.1.0 +langchain-classic>=1.0.0 +langchain-openai>=0.0.5 +langchain-community>=0.4.0 +web3>=6.0.0 +flask>=3.0.0 diff --git a/confidential-ai/agents/test.sh b/confidential-ai/agents/test.sh new file mode 100755 index 0000000..596f315 --- /dev/null +++ b/confidential-ai/agents/test.sh @@ -0,0 +1,57 @@ +#!/bin/bash +# Test the AI agent +# +# Prerequisites: +# - docker compose up (running in background) + +set -e + +BASE_URL="${BASE_URL:-http://localhost:8080}" + +echo "Testing AI Agent at $BASE_URL" +echo "==============================" + +# Wait for service to be ready +echo "Waiting for service..." +for i in {1..30}; do + if curl -s "$BASE_URL/" > /dev/null 2>&1; then + echo "Service ready" + break + fi + sleep 2 +done + +# Test info endpoint +echo -e "\n1. Testing info endpoint..." +INFO=$(curl -s "$BASE_URL/") +echo "Wallet: $(echo "$INFO" | jq -r '.wallet')" +echo "App ID: $(echo "$INFO" | jq -r '.app_id')" + +# Test attestation +echo -e "\n2. Testing attestation..." +ATTESTATION=$(curl -s "$BASE_URL/attestation?nonce=test-123") +QUOTE=$(echo "$ATTESTATION" | jq -r '.quote') +echo "TEE Quote (first 100 chars): ${QUOTE:0:100}..." + +# Test signing +echo -e "\n3. Testing message signing..." +SIG=$(curl -s -X POST "$BASE_URL/sign" \ + -H "Content-Type: application/json" \ + -d '{"message": "Hello from TEE"}') +echo "Signer: $(echo "$SIG" | jq -r '.signer')" +echo "Signature: $(echo "$SIG" | jq -r '.signature' | head -c 66)..." + +# Test chat (requires OPENAI_API_KEY) +if [ -n "$OPENAI_API_KEY" ]; then + echo -e "\n4. Testing chat..." + CHAT=$(curl -s -X POST "$BASE_URL/chat" \ + -H "Content-Type: application/json" \ + -d '{"message": "What is your wallet address?"}') + echo "Response: $(echo "$CHAT" | jq -r '.response')" +else + echo -e "\n4. Skipping chat test (OPENAI_API_KEY not set)" +fi + +echo -e "\n==============================" +echo "Tests completed!" +echo "Verify attestation at: https://proof.t16z.com" diff --git a/confidential-ai/inference/README.md b/confidential-ai/inference/README.md new file mode 100644 index 0000000..d7170ed --- /dev/null +++ b/confidential-ai/inference/README.md @@ -0,0 +1,110 @@ +# Private LLM Inference + +Deploy an OpenAI-compatible LLM endpoint where responses are signed by TEE-derived keys. Clients can verify responses came from genuine confidential hardware. + +## Quick Start + +```bash +phala auth login +phala deploy -n my-inference -c docker-compose.yaml \ + --instance-type h200.small \ + -e TOKEN=your-secret-token +``` + +The `-e` flag encrypts your token client-side—it only gets decrypted inside the TEE. + +First deployment takes 10-15 minutes (5GB+ image, model loading). Watch progress: + +```bash +phala cvms serial-logs --tail 100 +``` + +Test it: + +```bash +curl -X POST https:///v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer your-secret-token" \ + -d '{"model": "Qwen/Qwen2.5-1.5B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}' +``` + +## How It Works + +```mermaid +graph LR + Client -->|TLS| Proxy[vllm-proxy :8000] + subgraph TEE[Confidential VM] + Proxy -->|HTTP| vLLM + Proxy -.->|signs with| Key[TEE-derived key] + end + Proxy -->|signed response| Client +``` + +The proxy authenticates requests, forwards to vLLM, and signs responses with a key derived from the TEE's identity. That signature proves the response came from this specific confidential environment. + +## API + +OpenAI-compatible. The standard client works directly: + +```python +from openai import OpenAI + +client = OpenAI( + base_url="https:///v1", + api_key="your-secret-token" +) + +response = client.chat.completions.create( + model="Qwen/Qwen2.5-1.5B-Instruct", + messages=[{"role": "user", "content": "Hello!"}] +) +``` + +Additional endpoints: + +```bash +# Get signature for a response (use chat_id from response) +curl https:///v1/signature/ -H "Authorization: Bearer $TOKEN" + +# Get attestation report +curl https:///v1/attestation/report -H "Authorization: Bearer $TOKEN" +``` + +## Using Different Models + +Update both the vLLM command and proxy config in docker-compose.yaml: + +```yaml +# vLLM service +command: --model meta-llama/Llama-3.2-3B-Instruct ... + +# Proxy service +environment: + - MODEL_NAME=meta-llama/Llama-3.2-3B-Instruct +``` + +For gated models, add your Hugging Face token: + +```bash +phala deploy -n my-inference -c docker-compose.yaml \ + --instance-type h200.small \ + -e TOKEN=your-secret-token \ + -e HF_TOKEN=hf_xxxxx +``` + +## Updating Secrets + +Change tokens without full redeployment: + +```bash +phala deploy --cvm-id my-inference -c docker-compose.yaml \ + -e TOKEN=new-secret-token +``` + +Old tokens stop working immediately. + +## Cleanup + +```bash +phala cvms delete my-inference --force +``` diff --git a/confidential-ai/inference/docker-compose.yaml b/confidential-ai/inference/docker-compose.yaml new file mode 100644 index 0000000..0ccaef9 --- /dev/null +++ b/confidential-ai/inference/docker-compose.yaml @@ -0,0 +1,42 @@ +# Private LLM Inference in TEE +# +# Runs vLLM with vllm-proxy for response signing and attestation. +# Uses a small model (Qwen 1.5B) for quick testing. +# +# Deploy with encrypted secrets: +# phala deploy -n my-inference -c docker-compose.yaml \ +# --instance-type h200.small \ +# -e TOKEN=your-secret-token + +services: + vllm: + image: vllm/vllm-openai:latest + environment: + - NVIDIA_VISIBLE_DEVICES=all + - HF_TOKEN=${HF_TOKEN:-} + command: > + --model Qwen/Qwen2.5-1.5B-Instruct + --host 0.0.0.0 + --port 8000 + --max-model-len 4096 + --gpu-memory-utilization 0.8 + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all + capabilities: [gpu] + + proxy: + image: phalanetwork/vllm-proxy:v0.2.18 + volumes: + - /var/run/dstack.sock:/var/run/dstack.sock + environment: + - VLLM_BASE_URL=http://vllm:8000 + - MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct + - TOKEN=${TOKEN} + ports: + - "8000:8000" + depends_on: + - vllm diff --git a/confidential-ai/inference/requirements.txt b/confidential-ai/inference/requirements.txt new file mode 100644 index 0000000..a8608b2 --- /dev/null +++ b/confidential-ai/inference/requirements.txt @@ -0,0 +1 @@ +requests>=2.28.0 diff --git a/confidential-ai/inference/test.sh b/confidential-ai/inference/test.sh new file mode 100755 index 0000000..3b0f649 --- /dev/null +++ b/confidential-ai/inference/test.sh @@ -0,0 +1,55 @@ +#!/bin/bash +# Test vllm-proxy inference and attestation +# +# Prerequisites: +# - docker compose up (running in background) +# - pip install requests + +set -e + +BASE_URL="${BASE_URL:-http://localhost:8000}" + +echo "Testing vllm-proxy at $BASE_URL" +echo "================================" + +# Wait for service to be ready +echo "Waiting for service..." +for i in {1..60}; do + if curl -s "$BASE_URL/health" > /dev/null 2>&1; then + echo "Service ready" + break + fi + sleep 2 +done + +# Test chat completion +echo -e "\n1. Testing chat completion..." +RESPONSE=$(curl -s "$BASE_URL/v1/chat/completions" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "Qwen/Qwen2.5-1.5B-Instruct", + "messages": [{"role": "user", "content": "Say hello in exactly 5 words"}], + "max_tokens": 50 + }') + +CHAT_ID=$(echo "$RESPONSE" | jq -r '.id') +CONTENT=$(echo "$RESPONSE" | jq -r '.choices[0].message.content') + +echo "Chat ID: $CHAT_ID" +echo "Response: $CONTENT" + +# Test signature retrieval +echo -e "\n2. Testing signature retrieval..." +SIG=$(curl -s "$BASE_URL/v1/signature/$CHAT_ID") +echo "Signing address: $(echo "$SIG" | jq -r '.signing_address')" +echo "Response hash: $(echo "$SIG" | jq -r '.response_hash')" + +# Test attestation +echo -e "\n3. Testing attestation..." +ATTESTATION=$(curl -s "$BASE_URL/v1/attestation?nonce=test-nonce") +QUOTE=$(echo "$ATTESTATION" | jq -r '.quote') +echo "TEE Quote (first 100 chars): ${QUOTE:0:100}..." + +echo -e "\n================================" +echo "All tests passed!" +echo "Verify attestation at: https://proof.t16z.com" diff --git a/confidential-ai/inference/verify.py b/confidential-ai/inference/verify.py new file mode 100644 index 0000000..1aed227 --- /dev/null +++ b/confidential-ai/inference/verify.py @@ -0,0 +1,120 @@ +#!/usr/bin/env python3 +""" +Verify LLM responses from vllm-proxy. + +This script demonstrates how to: +1. Send chat completions to the TEE-hosted LLM +2. Retrieve and verify response signatures +3. Fetch TEE attestation quotes +""" + +import argparse +import hashlib +import json +import sys + +import requests + + +def chat_completion(base_url: str, message: str) -> dict: + """Send a chat completion request.""" + response = requests.post( + f"{base_url}/v1/chat/completions", + json={ + "model": "Qwen/Qwen2.5-1.5B-Instruct", + "messages": [{"role": "user", "content": message}], + "max_tokens": 256, + }, + headers={"Content-Type": "application/json"}, + ) + response.raise_for_status() + return response.json() + + +def get_signature(base_url: str, chat_id: str) -> dict: + """Retrieve the signature for a chat completion.""" + response = requests.get(f"{base_url}/v1/signature/{chat_id}") + response.raise_for_status() + return response.json() + + +def get_attestation(base_url: str, nonce: str = "verification-nonce") -> dict: + """Fetch TEE attestation quote.""" + response = requests.get(f"{base_url}/v1/attestation", params={"nonce": nonce}) + response.raise_for_status() + return response.json() + + +def verify_response_hash(response_content: str, expected_hash: str) -> bool: + """Verify response content matches the signed hash.""" + computed = hashlib.sha256(response_content.encode()).hexdigest() + return computed == expected_hash + + +def main(): + parser = argparse.ArgumentParser(description="Verify vllm-proxy responses") + parser.add_argument( + "--url", + default="http://localhost:8000", + help="vllm-proxy URL (default: http://localhost:8000)", + ) + parser.add_argument( + "--message", + default="What is confidential computing?", + help="Message to send", + ) + parser.add_argument( + "--attestation-only", + action="store_true", + help="Only fetch attestation, skip chat", + ) + args = parser.parse_args() + + if args.attestation_only: + print("Fetching attestation...") + attestation = get_attestation(args.url) + print(f"\nTEE Quote (first 200 chars):\n{attestation.get('quote', '')[:200]}...") + if attestation.get("gpu_evidence"): + print(f"\nGPU Evidence: {attestation['gpu_evidence'][:100]}...") + print("\nVerify at: https://proof.t16z.com") + return + + # Send chat completion + print(f"Sending message: {args.message}") + completion = chat_completion(args.url, args.message) + + chat_id = completion["id"] + response_text = completion["choices"][0]["message"]["content"] + + print(f"\nResponse:\n{response_text}") + print(f"\nChat ID: {chat_id}") + + # Get signature + print("\nFetching signature...") + sig = get_signature(args.url, chat_id) + + print(f"Request hash: {sig.get('request_hash', 'N/A')}") + print(f"Response hash: {sig.get('response_hash', 'N/A')}") + print(f"ECDSA sig: {sig.get('ecdsa_signature', 'N/A')[:64]}...") + print(f"Signing addr: {sig.get('signing_address', 'N/A')}") + + # Get attestation + print("\nFetching attestation...") + attestation = get_attestation(args.url) + + quote = attestation.get("quote", "") + print(f"TEE Quote: {quote[:64]}..." if quote else "No quote available") + + # Summary + print("\n" + "=" * 60) + print("VERIFICATION CHECKLIST") + print("=" * 60) + print("1. [ ] TEE quote is valid (verify at proof.t16z.com)") + print("2. [ ] Signing address in quote matches response signer") + print("3. [ ] Response hash matches actual response content") + print(f"\nPaste this quote at https://proof.t16z.com for verification:") + print(quote[:200] + "..." if len(quote) > 200 else quote) + + +if __name__ == "__main__": + main() diff --git a/confidential-ai/training/README.md b/confidential-ai/training/README.md new file mode 100644 index 0000000..7ddd2bf --- /dev/null +++ b/confidential-ai/training/README.md @@ -0,0 +1,45 @@ +# Confidential Training + +Fine-tune LLMs on sensitive data using JupyterLab inside a TEE. Upload your dataset through the browser, run training cells, download your model. + +## Quick Start + +```bash +phala auth login +phala deploy -n my-training -c docker-compose.yaml \ + --instance-type h200.small \ + -e HF_TOKEN=hf_xxxxx \ + -e JUPYTER_PASSWORD=your-secret +``` + +Open `https://:8888` and log in with your password. + +## Workflow + +```mermaid +graph LR + You -->|upload data| Jupyter + subgraph TEE[Confidential VM] + Jupyter[JupyterLab] + Jupyter --> GPU[Confidential GPU] + end + Jupyter -->|download model| You +``` + +1. Use the pre-installed notebooks in `/workspace/unsloth-notebooks/` or upload `finetune.ipynb` from this repo +2. Upload your training data (JSONL with `instruction` and `response` fields) +3. Run the cells to train +4. Download the output folder with your fine-tuned weights + +Your data and model weights stay encrypted in GPU memory throughout training. + +## Cleanup + +```bash +phala cvms delete my-training --force +``` + +## Further Reading + +- [Confidential AI Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/confidential-ai.md) +- [Unsloth](https://github.com/unslothai/unsloth) for the training library diff --git a/confidential-ai/training/docker-compose.yaml b/confidential-ai/training/docker-compose.yaml new file mode 100644 index 0000000..67ce674 --- /dev/null +++ b/confidential-ai/training/docker-compose.yaml @@ -0,0 +1,26 @@ +# Confidential Fine-tuning with Jupyter +# +# Run JupyterLab inside a TEE for interactive training. +# Upload data via browser, run training cells, download results. +# +# Deploy: phala deploy -n my-training -c docker-compose.yaml \ +# --instance-type h200.small -e HF_TOKEN=hf_xxx -e JUPYTER_PASSWORD=your-secret + +services: + jupyter: + image: unsloth/unsloth:latest + volumes: + - /var/run/dstack.sock:/var/run/dstack.sock + environment: + - NVIDIA_VISIBLE_DEVICES=all + - HF_TOKEN + - JUPYTER_PASSWORD + ports: + - "8888:8888" + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all + capabilities: [gpu] diff --git a/confidential-ai/training/finetune.ipynb b/confidential-ai/training/finetune.ipynb new file mode 100644 index 0000000..ea3d71d --- /dev/null +++ b/confidential-ai/training/finetune.ipynb @@ -0,0 +1,162 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Confidential Fine-tuning\n", + "\n", + "Fine-tune an LLM on your data inside a TEE. Upload your dataset, run the cells, download your model." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Configure\n", + "\n", + "Upload your training data using the file browser (left sidebar), then set the path below.\n", + "\n", + "Data format: JSONL with `instruction` and `response` fields:\n", + "```json\n", + "{\"instruction\": \"What is 2+2?\", \"response\": \"4\"}\n", + "{\"instruction\": \"Explain gravity\", \"response\": \"Gravity is...\"}\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DATA_PATH = \"data.jsonl\" # Path to your uploaded data\n", + "MODEL_NAME = \"unsloth/Llama-3.2-1B-Instruct\"\n", + "OUTPUT_DIR = \"output\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Load Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from unsloth import FastLanguageModel\n", + "\n", + "model, tokenizer = FastLanguageModel.from_pretrained(\n", + " model_name=MODEL_NAME,\n", + " max_seq_length=2048,\n", + " load_in_4bit=True,\n", + ")\n", + "\n", + "model = FastLanguageModel.get_peft_model(\n", + " model,\n", + " r=16,\n", + " lora_alpha=16,\n", + " lora_dropout=0,\n", + " target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\"],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Load Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "from datasets import Dataset\n", + "\n", + "with open(DATA_PATH) as f:\n", + " data = [json.loads(line) for line in f]\n", + "\n", + "formatted = [\n", + " {\"text\": f\"### Instruction:\\n{item['instruction']}\\n\\n### Response:\\n{item['response']}\"}\n", + " for item in data\n", + "]\n", + "\n", + "dataset = Dataset.from_list(formatted)\n", + "print(f\"Loaded {len(dataset)} examples\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Train" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import TrainingArguments\n", + "from trl import SFTTrainer\n", + "\n", + "trainer = SFTTrainer(\n", + " model=model,\n", + " train_dataset=dataset,\n", + " dataset_text_field=\"text\",\n", + " max_seq_length=2048,\n", + " args=TrainingArguments(\n", + " output_dir=OUTPUT_DIR,\n", + " per_device_train_batch_size=2,\n", + " gradient_accumulation_steps=4,\n", + " num_train_epochs=1,\n", + " learning_rate=2e-4,\n", + " fp16=True,\n", + " logging_steps=1,\n", + " save_strategy=\"epoch\",\n", + " ),\n", + " tokenizer=tokenizer,\n", + ")\n", + "\n", + "trainer.train()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Save & Download" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model.save_pretrained(OUTPUT_DIR)\n", + "tokenizer.save_pretrained(OUTPUT_DIR)\n", + "print(f\"Model saved to {OUTPUT_DIR}/\")\n", + "print(\"Use the file browser to download the output folder.\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}