ATLAS

Active-inference Training with Learned Adaptive Stigmergy

"Don't train on what humans wrote about the world.
Train on what you actually discover about the world.
Validate what you claim. Own what you build."

An open source contribution from OpenHub Research (Thailand)
Website: atlasagi.org · Observatory · Author: Robin Dey · Institution: https://openhubresearch.org/

What is ATLAS?

ATLAS is a next-generation LLM training framework built in pure Rust with zero external crate dependencies — the SQLite principle applied to AI infrastructure.

It fuses four architectural innovations:

Component	Role	Key property
ASTRA-dev	Live discovery engine	~10s/cycle, NASA/WHO/World Bank APIs, causal inference
GraphPalace	Stigmergic memory	Pheromone-guided curriculum, O(1/√T) convergence
TRM-CausalValidator	Recursive validator	7M params, 0.1% compute, Quality Gate 6
ZK Schnorr proofs	Provenance chain	LLM output → live API, cryptographically verifiable

v4.0.0 — Champagnat n-morphic Framework + OLMo-3-7B Fix:

🧬 InvasionFitnessScorer — morphic fitness f(y) = success − cost − Σcos_sim·n̄ (fixes pheromone saturation)
🌊 CanonicalPheromoneUpdate — principled decay Δρ ∝ μ·σ²·n̄·∂₁s (Champagnat-Méléard 2011)
⚖️ BarBovier2017Constraints — stability gate: explore_ratio × batch_size > 10, temp > 1/√batch
🔀 CognitiveBranching — n-morphic OODA bifurcation on plateau detection
🔆 HJConcentrationPrior — Hopf-Cole sharpening T_eff(s) = T₀/(1+γs) in TRM recursion
🔧 Issue #7 fix — OLMo-3-7B SWA (24/32 sliding layers, window=4096) + YaRN RoPE + config.json auto-patch

v4.0.2 — BF16 GPU Inference Path (Issue #9):

⚡ BF16 W16A32 — weights in BF16 (14 GB) vs f32 (28 GB); GpuBufBf16, GpuBufKind, upload_bf16() in atlas-tensor
🔥 GEMV kernels — sgemv_bf16_kernel + sgemv_f32_kernel: one-warp-per-row for N=1 decode; fixes 32× tiled-GEMM inefficiency
🚀 OLMo-3-7B-Think: 4.1 → 19.9 tok/s (4.8× speedup, A100-SXM4-40GB, W16A32)

v4.0.3 — Math Integrity Fixes (Issue #11):

🧮 CanonicalPheromoneUpdate λ decay — replaced linear formula base_rate × (1 − canonical_term) (went negative when term > 1, dead gradient at clamp boundary) with base_rate × exp(−canonical_term): always positive, smooth, zero-gradient fidelity, hardware-safe for v6 ASIC spec
🏆 InvasionFitnessScorer competition kernel — fixed negative Lotka-Volterra coefficients: raw cosine_sim ∈ [−1, 1] was giving fitness bonuses to anti-correlated strategies (mutualism, not competition); replaced with α_ij = ReLU(cos_sim − 0.2) — threshold at 4σ above noise floor in d=384 embedding space; competition_threshold added to InvasionFitnessConfig
✅ 532/532 tests (+4 new regression tests); GPU validated: 47/47 A100 model tests, OLMo-3-7B-Think still 19.9 tok/s

v4.0.7 — OLMo-2/3 Post-Norm + QK-Norm Architecture Fixes:

🏗️ Post-norm layer ordering — OLMo-2/3 uses x = residual + rmsnorm(output) (normalize output, then add residual). ATLAS was incorrectly doing x = rmsnorm(x + output) (GPU) or pre-norm (CPU). Fixed both paths to match HuggingFace Olmo2DecoderLayer reference implementation.
🔧 QK-norm per-head weight slicing — q_norm.weight has shape [n_heads × head_dim] — each head has unique norm weights. rmsnorm_inplace was always using weights[0..128] for every head instead of weights[h*128..(h+1)*128]. Fixed GPU path.
➕ QK-norm added to CPU attention path — the CPU Attention::forward_token() had no QK-norm at all. Added per-head QK-norm before RoPE to match GPU path.
✅ Before → After: CPU/GPU logit agreement went from max diff 20.0 → 0.000015. Top-1 token for "capital of France" went from yp/décor → Paris. OLMo-3-7B-Think now produces coherent <think> reasoning traces at 15.4 tok/s on A100.

v4.0.6 — Sampling Controls (Issue #16):

🎛️ Full sampling pipeline: repetition penalty, temperature, top-p, top-k, min-p, frequency/presence penalty — 7-stage pipeline eliminates model text degeneration
🔧 SamplingConfig struct with ::olmo3() preset (temp=0.6, rep_penalty=1.1, top_p=0.95, top_k=50, min_p=0.05)
🆕 generate_with_sampling() — full control; generate() remains backward-compatible
📡 OpenAI-compatible API — top_p, repetition_penalty, frequency_penalty, presence_penalty in request body
✅ 562/562 tests (+13 new: repetition penalty, top-p/top-k/min-p filtering, frequency/presence penalties, greedy+sampling)

v4.0.5 — Inference Pipeline Fixes (Issues #13, #14, #15):

🛑 EOS stopping — generate() now stops on EOS token (was dead code: if let Some(eos) = None::<u32>); parsed from config.json (OLMo-3: 100257), wired through model → API finish_reason: "stop" works correctly
🎲 Stateful PRNG — XorShift64 replaces deterministic step-based LCG hash; re-seeded from system time each generate() call; repeated requests now produce different completions
💬 ChatML template — <|im_start|>/<|im_end|> format (OLMo-3, SmolLM2, Qwen); auto-detected from tokenizer special tokens; also supports Llama-3 format; fixes garbage output from wrong <|system|>/<|user|> tokens
✅ 549/549 tests (+10 new: EOS stopping, PRNG variability, ChatML/Llama3/Generic templates)

v4.0.4 — GPT-4 Regex Tokenizer (Issue #12):

🔤 Full HuggingFace tokenizer.json support — hand-coded GPT-4/OLMo-3/LLaMA-3 pre-tokenization regex (zero external deps): contractions, word boundaries, 3-digit number grouping, punctuation, newlines, whitespace with backtracking
✅ Verified against HuggingFace tokenizers — OLMo-3 encode("The capital of France is") → [791, 6864, 315, 9822, 374], SmolLM2 verified, round-trip decode perfect
🧪 End-to-end GPU test — tokenize→generate→decode on A100, chat template with <|im_start|>/<|im_end|> special tokens
✅ 539/539 tests (+7 pre-tokenizer unit + 2 integration + 1 e2e GPU)

The result: a self-improving scientific intelligence that trains on what it actually discovers about the world — real causal relationships from live data, validated by recursive architecture, guided by stigmergic memory.

Nobody has built this before. See CHARTER.md for the full architecture.

The Big Idea

Every other LLM is trained on:

What humans wrote on the internet (web scrapes, Wikipedia)
Synthetic data generated by another LLM (GPT-4 distillation)
Human-curated datasets (expensive, frozen at curation time)

atlas-7b is trained on:

What an autonomous science engine actually discovers about the world
Real causal relationships extracted from live NASA, WHO, World Bank APIs
Validated findings with Bayesian confidence scores and PC/FCI causal inference
A corpus that grows every 10 seconds and never contains stale or duplicated information

This is not a better fine-tuning recipe. This is a different paradigm for what training data can be.

Pure Rust — Zero Dependencies

The SQLite principle applied to AI infrastructure.

atlas/
├── Cargo.toml          # workspace root — [dependencies] is empty by design
├── kernels/
│   ├── matmul.cu       # raw CUDA kernel (no cudarc crate)
│   ├── attention.cu    # flash attention from scratch
│   └── quant.cu        # INT4/INT8 quantization
└── crates/
    ├── atlas-core/     # error types, traits, config
    ├── atlas-tensor/   # Tensor + CUDA FFI (the seed of everything)
    ├── atlas-grad/     # autograd tape, backward pass
    ├── atlas-optim/    # AdamW, cosine LR scheduler
    ├── atlas-quant/    # INT4/INT8 quantization, QLoRA
    ├── atlas-model/    # transformer: MultiHeadAttn, FFN, RMSNorm, RoPE
    ├── atlas-tokenize/ # BPE tokenizer (sentencepiece port)
    ├── atlas-palace/   # GraphPalace stigmergic memory: A* search, 5-type pheromones, Active Inference
    ├── atlas-mcp/      # MCP server: 28 palace tools via JSON-RPC 2.0 stdio + connection pool
    ├── atlas-api/      # OpenAI-compatible HTTP endpoint: /v1/chat/completions, SSE streaming
    ├── atlas-trm/      # TRM-CausalValidator (7M params, arXiv:2510.04871)
    ├── atlas-causal/   # PC/FCI causal inference (py-causal port)
    ├── atlas-bayes/    # Bayesian confidence scoring
    ├── atlas-astra/    # ASTRA OODA engine (~8K LOC, full port)
    ├── atlas-corpus/   # LiveDiscoveryCorpus + DeepSupervisionTrainer + quality gates
    ├── atlas-zk/       # ZK Schnorr proofs (asi-build port)
    ├── atlas-http/     # HTTP client via raw libc syscalls
    ├── atlas-json/     # JSON parser from source
    ├── atlas-safety/   # Horn-clause safety constitution, 5-state FSM, CircuitBreaker
    ├── atlas-bridge/   # ZK-attested Rings↔ETH interface (Sepolia-compatible)
    └── atlas-cli/      # CLI: train / discover / eval / prove / mcp / api / bench

21 crates. One coherent system. Zero external Rust dependencies.

CUDA is called via raw extern "C" FFI from build.rs + .cu kernel files — no cudarc, no tch, no candle. The same approach that makes SQLite trustworthy, applied to GPU compute.

// atlas-tensor/src/lib.rs — the first line of ATLAS
pub struct Tensor {
    data:  Vec<f32>,
    shape: Vec<usize>,
}

Every billion-parameter transformer starts here.

Seven Pillars

GraphPalace Memory — pheromone-weighted persistent knowledge; search_by_embedding(), hot_paths(), deposit_pheromones()
Morphic Warm-Start — O(1/√T) cross-run convergence (proven in BUTTERS, R²=0.982, p<10⁻³⁰)
Stigmergic RLVR — r_total = α·r_verifiable + β·r_pheromone; pheromone decay prevents reward hacking
Active Inference Data Gen — palace cold spots direct ASTRA to fill knowledge gaps
ZK Knowledge Claims — Schnorr proof chain from LLM output to raw API data; hallucinations have broken proof trails
LiveDiscoveryCorpus — ASTRA's output as a living training dataset; ~86K quality examples/month
TRM-CausalValidator — 7M-param recursive validator; z = net(x,y,z) × 6 recursions; Quality Gate 6; generates Type 5 training traces

GPU Inference

ATLAS v4.0.0 delivers a fully GPU-resident forward pass — hidden states stay in VRAM between tokens, with pre-pinned weight upload at model load time.

A100-SXM4-40GB Benchmark (sm_80, CUDA 13.0)

Model	Params	GPU tok/s	VRAM	Notes
SmolLM2-135M	135M	37.7	507 MiB	f32, sm_80
SmolLM2-360M	360M	25.4	~1.4 GB	f32
SmolLM2-1.7B	1.7B	12.6	~6.5 GB	f32, 2.4× over CPU
TinyLlama-1.1B	1.1B	20.9	~8.4 GB	f32
OLMo-3-7B-Think	7B	15.4	~14 GB	BF16 W16A32 (v4.0.7); post-norm + QK-norm fixed, correct output

CUDA Kernel Suite

Kernel	What it does
`rmsnorm_forward`	RMSNorm in CUDA — replaces per-token CPU loop
`rope_forward`	RoPE rotation — parallel over heads
`silu_mul_forward`	SwiGLU gate fused — single CUDA pass
`atlas_adamw_step`	AdamW optimizer step entirely on GPU
`sgemm_vec`	Zero-copy matrix×vector; `GpuVec` activation buffer

CUDA portability: all kernels use rsqrtf() (not __rsqrtf()) for cross-platform compatibility.

atlas-api — OpenAI-Compatible Endpoint

ATLAS v4.0.0 adds atlas-api — an OpenAI-compatible HTTP inference server. Drop-in replacement for any OpenAI API client.

# Start the server
./target/release/atlas api serve --model /home/user/models/smollm2-135m --port 8080

Endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat completions with SSE streaming
`/v1/completions`	POST	Text completions
`/v1/models`	GET	List available models

Usage Examples

# Chat completion (streaming)
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "atlas",
    "messages": [{"role": "user", "content": "What is morphic resonance?"}],
    "stream": true
  }'

# Non-streaming
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "atlas",
    "messages": [{"role": "user", "content": "Explain stigmergic memory"}],
    "stream": false
  }'

# List models
curl http://localhost:8080/v1/models

Features: SSE streaming, CORS headers, echo mode for testing, 40 tests, 0 external dependencies.

DeepSupervisionTrainer

The DeepSupervisionTrainer in atlas-corpus implements multi-pass deep supervision — each training batch runs N_sup=4..16 forward passes, summing loss across all supervision points with pheromone-driven latent carry between passes.

// atlas-corpus/src/deep_supervision.rs
pub struct DeepSupervisionTrainer {
    pub n_sup: usize,          // 4..16 forward passes per batch
    pub latent_carry: bool,    // carry hidden state between passes
    pub pheromone_weight: f32, // pheromone × mutation-selection coupling
    pub loss_trace: Vec<f32>,  // per-pass loss telemetry
}

Theory (TRM arXiv:2510.04871 validated): deep supervision accounts for >75% of TRM's improvement over baseline transformers. The DeepSupervisionTrainer implements this in the training loop — each N_sup pass is one phenotypic morph; latent carry approximates Lotka-Volterra equilibrium n̄ᵢ; pheromone_weight sets the mutation-selection coupling μ.

Convergence prediction: doubling N_sup → √2 speedup in O(1/√T) morphic convergence. Testable via the loss_trace telemetry.

Horn-Clause Safety Constitution

atlas-safety v4.0.0 adds a tractable safety constitution implemented as Horn clauses, alongside the existing 5-state FSM and CircuitBreaker.

8 safety principles across 4 non-overlapping domains:
  ┌─────────────────┬──────────────────────────────────┐
  │ capability      │ scope_limits, capability_bounds  │
  │ data            │ provenance_required, dedup_gate  │
  │ deployment      │ audit_trail, circuit_breaker     │
  │ reasoning       │ causal_grounding, zk_verifiable  │
  └─────────────────┴──────────────────────────────────┘

Why Horn clauses? Young (2026, arXiv:2501.15446) proves NP-hardness of general safety constitution verification. Horn-clause restriction (≤12 principles, 4 non-overlapping domains) ensures polynomial tractability — the safety checker can verify any system state in O(n·m) where n = principles, m = state predicates. No exponential blowup.

PalaceBackend Trait

atlas-palace v4.0.0 extracts a PalaceBackend trait, enabling pluggable storage backends without API changes:

pub trait PalaceBackend: Send + Sync {
    fn search(&self, query: &str, limit: usize) -> Vec<DrawerMatch>;
    fn deposit_pheromones(&mut self, path: &[RoomId], ptype: PheromoneType, intensity: f32);
    fn navigate(&self, from: RoomId, to: RoomId) -> Vec<RoomId>;
    fn hot_paths(&self, limit: usize) -> Vec<Path>;
    // ... 32 additional methods
}

// Palace implements PalaceBackend — fully trait-object safe
pub struct Palace { /* existing implementation */ }
impl PalaceBackend for Palace { /* ... */ }

// Swap backends without changing caller code
let palace: Box<dyn PalaceBackend> = Box::new(Palace::new(config));

This is the prerequisite for LadybugDB migration (Q3 2026) — a drop-in Grafeo/LadybugDB backend can replace the default implementation with zero API changes.

Build Order (7 Stages, ~22 Weeks)

Stage	Weeks	Crates	Milestone
1	1–4	atlas-core → tensor → grad → optim → quant	f32 matmul CPU+GPU, backward pass through 2-layer MLP
2	5–7	atlas-model → tokenize	OLMo 3 7B forward pass in pure Rust, token generation
3	8–9	atlas-palace + atlas-mcp	GraphPalace 36-method engine native, MCP server
4	10–11	atlas-trm	TRM-CausalValidator, <10ms causal graph pass/fail
5	12–16	http → json → bayes → causal → zk → astra	Full ASTRA OODA in Rust, ZK provenance
6	17–20	atlas-corpus + atlas-api	QLoRA SFT, DeepSupervisionTrainer, OpenAI API
7	21–22	atlas-zk (ext) → cli	End-to-end proof chain, atlas-7b release binary

Architecture Diagrams

Eight publication-quality figures are in docs/dashboard/diagrams/. The interactive dashboard (project overview, roadmap, papers, component status) is at docs/dashboard/index.html.

Figure	Description
Fig. 1	Full System Architecture (v3.0, TRM cluster)
Fig. 2	Discovery Flywheel — the self-improving loop
Fig. 3	ASTRA OODA + GraphPalace integration
Fig. 4	Morphic Warm-Start cross-run convergence
Fig. 5	Stigmergic RLVR pheromone reward function
Fig. 6	ZK Provenance Chain
Fig. 7	Training Pipeline phase roadmap
Fig. 8	Hybrid Generative-Recursive Architecture (TRM v3.0)

Paper Strategy

Paper	Venue	Contribution
Paper 1	EMNLP 2026	ATLAS architecture + LiveDiscoveryCorpus
Paper 2	NeurIPS 2026	Discovery Flywheel — closed-loop scientific intelligence
Paper 3	ICML 2027	Stigmergic RLVR — pheromone reward prevents policy collapse
Paper 4	ICLR 2027	O(1/√T) morphic convergence for LLMs (co-author Robin Dey)
Paper 5	IEEE S&P 2027	End-to-end ZK provenance for LLM outputs
Paper 6	ICLR/NeurIPS 2027	Hybrid generative-recursive architecture (TRM integration)

Getting Started

git clone https://github.com/web3guru888/ATLAS.git
cd ATLAS

# Run all tests (excludes CUDA-requiring tensor tests on CPU-only machines)
cargo test --workspace --exclude atlas-tensor

# Build the atlas binary
cargo build --release -p atlas-cli

# Full OODA discovery loop
./target/release/atlas discover --cycles 5 --output corpus.json

# Train on discoveries
./target/release/atlas train --corpus corpus.json --epochs 3

# Start OpenAI-compatible API server
./target/release/atlas api serve --model /path/to/model --port 8080

# ZK-prove a claim
./target/release/atlas prove --claim "Pheromone trails compound information gain" \
    --secret $(openssl rand -hex 16)

# Inspect palace memory
./target/release/atlas palace --stats --hot

# MCP server (connect to Claude Desktop / Cursor)
./target/release/atlas mcp serve --palace my-palace.json

Prerequisites:

Rust 1.75+ (rustup update stable)
CUDA 12.x + nvcc (optional; falls back to CPU if absent)
GPU with sm_75+ (Tesla T4 / A100+) for CUDA training path

Status — v4.0.7

562/562 tests passing · 21 crates · Zero external crate dependencies · CUDA sm_80 on A100-SXM4-40GB · 15.4 tok/s OLMo-3-7B-Think (BF16)

🏔 v4.0.7 is the current release. Three critical OLMo-2/3 architecture bugs fixed: post-norm layer ordering (CPU + GPU), QK-norm per-head weight slicing, and missing CPU QK-norm. OLMo-3-7B-Think now produces correct, coherent output with CPU/GPU logit agreement at 0.000015 max diff. Full sampling pipeline (v4.0.6): repetition penalty, temperature, top-p, top-k, min-p, frequency/presence penalty.

What Works

✅ Discovery is real — atlas discover --cycles 3 hits NASA POWER, WHO GHO, World Bank, ArXiv live APIs; causal inference via PC algorithm; Bayesian quality gates
✅ Memory is real — 5-type pheromone system (exploitation/exploration/success/traversal/recency), MMAS ceiling, A* semantic pathfinding (α·C_sem + β·C_phe + γ·C_str), Active Inference agents; atlas palace --hot shows pheromone trails
✅ Training is real — SFT with GradTape + AdamW + LoRA (rank=8) + gradient accumulation + safetensors checkpoint; DeepSupervisionTrainer (N_sup=4..16, loss trace, latent carry)
✅ GPU inference is real — SmolLM2-135M at 37.7 tok/s on A100-SXM4-40GB; OLMo-3-7B-Think at 15.4 tok/s (BF16 GPU, W16A32, 14 GB VRAM); SWA + YaRN RoPE; post-norm + QK-norm architecture (v4.0.7 fixed — CPU/GPU logit diff 0.000015)
✅ API is real — atlas api serve exposes /v1/chat/completions + /v1/completions + /v1/models; SSE streaming; CORS; 40 tests
✅ Provenance is real — Schnorr proofs + Groth16 stub (HMAC-SHA256, BLS12-381-compatible interface) + ProvenanceChain; atlas prove generates verifiable proofs
✅ Safety is real — Horn-clause constitution (8 principles, 4 domains, Young 2026 NP-hardness validated); 5-state FSM (BOOT→NOMINAL→DEGRADED→SAFE_MODE→EMERGENCY_STOP); CircuitBreaker; append-only audit log
✅ Bridge is real — AtlasBridge with ZK-attested deposit/withdraw, Sepolia chain_id=11155111, Groth16 proof per transaction
✅ MCP is real — atlas mcp serve exposes 28 tools via JSON-RPC 2.0; McpConnectionPool (max 5, 5-min idle eviction); connects to Claude Desktop / Cursor

Version History

Version	Theme	Tests
v0.1.0	Infrastructure: f32 matmul, backward pass, GPU (7 stages)	186
v0.2.0	Real Memory Palace + MCP (28 tools, JSON-RPC 2.0)	236
v0.3.0 + v0.4.0	Real Discovery Engine + Validated Model Loading	260
v0.5.0	Real Training Loop (LoRA, grad-accum, safetensors checkpoint)	353
v0.6.0	Safety FSM + Groth16 stub + ZK Bridge	383
v0.7.0	Benchmarks, CI, CHANGELOG, REPRODUCIBILITY	383
v1.0.0	Production Release — all milestones complete	383
v2.0.0	CAS Decay + OODA Feedback + Stigmergic Sampler + GPU dispatch (37.7 tok/s on A100)	400
v3.0.0-α.1	atlas-api + PalaceBackend + GPU-resident forward pass + DeepSupervisionTrainer + Horn-clause safety	426
v4.0.0	Champagnat n-morphic framework + Issue #7 fix (SWA + YaRN RoPE + config.json auto-patch for OLMo-3-7B)	528
v4.0.1	Docs + test cleanup for v4.0.0 / Issue #7	528
v4.0.2	BF16 GPU inference path (Issue #9): OLMo-3-7B-Think 4.1 → 19.9 tok/s (4.8×), W16A32, GEMV kernels	528
v4.0.3	Math integrity (Issue #11): λ exp decay + ReLU competition threshold. 47/47 GPU model tests.	532
v4.0.4	GPT-4 regex tokenizer (Issue #12): full HuggingFace tokenizer.json support. OLMo-3 + SmolLM2 verified. E2E GPU test.	539
v4.0.5	Inference pipeline fixes (Issues #13–15): EOS stopping, XorShift64 PRNG, ChatML auto-detection.	549
v4.0.6	Sampling controls (Issue #16): repetition penalty, top-p, top-k, min-p, freq/pres penalty. 7-stage pipeline.	562
v4.0.7	OLMo-2/3 post-norm + QK-norm fixes: 3 architecture bugs. CPU/GPU logit diff 20.0 → 0.000015. Correct OLMo-3 output.	562

Crate Status

Crate	Stage	Tests	Status
atlas-core	1	2	✅ Error types, Result, traits
atlas-tensor	1	6	✅ CPU+GPU matmul, INT8/INT4, sm_80 kernels (A100); GPU AdamW kernel; sgemm_vec zero-copy; BF16 GEMV (`GpuBufBf16`, `sgemv_bf16_kernel`, W16A32 inference path)
atlas-grad	1	9	✅ GradTape, matmul/relu/add backward
atlas-optim	1	6	✅ AdamW + CosineScheduler, warmup
atlas-quant	1	7	✅ INT8, INT4, symmetric scaling
CUDA kernels	1	—	✅ tiled GEMM, rmsnorm, rope, silu_mul, AdamW, INT8/INT4 — compiled on A100-SXM4-40GB (sm_80)
atlas-json	2	12	✅ Recursive descent parser, surrogate pairs
atlas-tokenize	2	14	✅ GPT-4 regex pre-tokenization (7 alts w/ backtracking), byte-level BPE, HF tokenizer.json; OLMo-3 + SmolLM2 verified
atlas-model	2	27	✅ OLMo 3 / Llama 3, RoPE, GQA, SwiGLU, SWA, YaRN RoPE, config.json auto-patch; GPU-resident forward pass; v4.0.7: post-norm architecture + QK-norm per-head slicing
atlas-palace	3	79	✅ A* search, 5-type pheromones, Active Inference, MMAS, PalaceBackend trait, session_id, PalaceConfig; v4.0.3: `CanonicalPheromoneUpdate` uses `exp(−x)` decay (always positive, smooth, hardware-safe)
atlas-mcp	3	32	✅ 28 MCP tools, JSON-RPC 2.0, live palace dispatch; McpConnectionPool (max 5, 5-min idle eviction)
atlas-api	3	40	✅ OpenAI-compatible HTTP: /v1/chat/completions, /v1/completions, /v1/models; SSE streaming; CORS
atlas-trm	4	12	✅ TRM-CausalValidator depth-6 RNN, Bayesian combining
atlas-http	5	11	✅ HTTP/1.1 TcpStream, chunked decoding, curl HTTPS
atlas-bayes	5	13	✅ BetaPrior, BayesNetwork, QualityGate, Jaccard novelty
atlas-causal	5	10	✅ PC algorithm, Fisher-Z, standard normal CDF, Meek rules
atlas-zk	5	19	✅ Schnorr + Groth16 stub (HMAC-SHA256, BLS12-381 interface)
atlas-astra	5	15	✅ OODA: NASA POWER / WHO GHO / World Bank / ArXiv; OodaFeedback adaptive explore_ratio
atlas-corpus	6	79	✅ SftTrainer, LoRA (rank=8), grad-accum, safetensors checkpoint; DeepSupervisionTrainer (N_sup 4–16, loss_trace); v4.0.3: `InvasionFitnessScorer` uses `ReLU(cos_sim − 0.2)` competition (α_ij ≥ 0, no mutualism)
atlas-safety	6	30	✅ Horn-clause constitution (8 principles, 4 domains); 5-state FSM; CircuitBreaker; append-only audit log
atlas-bridge	6	8	✅ ZK-attested Rings↔ETH interface, Sepolia chain_id=11155111
atlas-cli	7	30	✅ discover / corpus / train / eval / prove / palace / mcp / api / bench / status
TOTAL		562	✅ All passing — v4.0.7

Quick Start

git clone https://github.com/web3guru888/ATLAS.git
cd ATLAS
cargo build --release -p atlas-cli

# Full OODA discovery + training loop
./target/release/atlas discover --cycles 3 --output my-corpus.json
./target/release/atlas train --corpus my-corpus.json --epochs 2
./target/release/atlas prove --claim "CO2 drives warming" --secret deadbeef01020304
./target/release/atlas palace --stats --hot

# OpenAI-compatible API server
./target/release/atlas api serve --model /path/to/model --port 8080

# MCP server (connect to Claude Desktop / Cursor)
./target/release/atlas mcp serve --palace my-palace.json

# Run benchmarks
./target/release/atlas bench --all

atlas-mcp — Model Context Protocol Server

ATLAS exposes its memory palace as 28 MCP tools via stdio JSON-RPC 2.0, ready for Claude Desktop, Cursor, or any MCP client. v4.0.0 adds McpConnectionPool — lazy pool (max 5 connections, 5-min idle eviction) preventing connection leaks across concurrent MCP clients.

# Add to your Claude Desktop config (~/.config/claude/claude_desktop_config.json)
{
  "mcpServers": {
    "atlas-palace": {
      "command": "./target/release/atlas",
      "args": ["mcp", "--palace", "my-palace.json"]
    }
  }
}

Tool categories:

Category	Tools	Examples
Navigation	8	`palace_search`, `palace_navigate`, `palace_find_similar`
Operations	5	`palace_add_wing`, `palace_add_room`, `palace_add_drawer`
Knowledge Graph	7	`palace_kg_add`, `palace_kg_query`, `palace_kg_contradictions`
Stigmergy	5	`palace_deposit_pheromones`, `palace_hot_paths`, `palace_cold_spots`
Agent Diary	3	`palace_create_agent`, `palace_diary_write`, `palace_diary_read`

Every tool call modifies live palace state. Pheromone trails compound across sessions. Knowledge graphs grow with every interaction.

Benchmarks

ATLAS includes a zero-dependency benchmark suite using atlas_core::bench::Bench. Run with:

cargo test --workspace --exclude atlas-tensor -- --ignored --nocapture

Representative results (Ubuntu, Rust 1.95, A100-SXM4-40GB, CUDA 13.0):

Benchmark	Metric	Description
`gpu_inference_smollm2`	37.7 tok/s	SmolLM2-135M GPU inference (f32), A100-SXM4-40GB
`gpu_benchmark_olmo3_7b_think_bf16`	15.4 tok/s	OLMo-3-7B-Think BF16 GPU inference (W16A32), A100-SXM4-40GB, v4.0.7
`palace_search_1000`	~50–200 µs/op	TF-IDF semantic search across 1000 drawers
`astar_100_nodes`	~20–100 µs/op	Pheromone-guided A* pathfinding (100-node KG)
`pheromone_deposit_decay_1000`	~5–20 µs/op	10 deposits + full decay cycle per iteration
`kg_query_100_edges`	~0.5–2 µs/op	KG edge lookup from a source node
`rmsnorm_2048`	~1–5 µs/op	RMSNorm on 2048-dim vector
`rope_128dim_apply`	~50–200 ns/op	RoPE rotation on a single attention head
`schnorr_prove_verify`	~200–500 ns/op	Schnorr ZK proof generation + verification
`json_parse_1kb`	~5–20 µs/op	Parse a 1KB JSON document (zero-dep parser)

Note: Numbers vary by hardware. Run benchmarks on your own machine for accurate results.

Key Numbers

37.7 tok/s — GPU inference throughput (SmolLM2-135M on A100-SXM4-40GB, v4.0.0)
15.4 tok/s — GPU inference throughput (OLMo-3-7B-Think, BF16 W16A32, A100-SXM4-40GB, v4.0.7; was 4.1 tok/s CPU = 3.75× speedup; correct post-norm + QK-norm architecture)
2.4× — GPU speedup over CPU inference (SmolLM2-1.7B: 12.6 vs 5.2 tok/s)
507 MiB — VRAM for pre-pinned SmolLM2-135M weights
d = 10.6 — Cohen's d for palace-memory vs. no-memory (ASTRA experiments)
34.4× — more discoveries with memory than without
R² = 0.982 — O(1/√T) convergence fit (BUTTERS morphic warm-start)
1.83× — cross-domain novelty acceleration (DC-24 experiment)
7M params — TRM-CausalValidator size vs. 7B base model (1000× smaller)
45% — TRM accuracy on ARC-AGI-1 (Samsung SAIL Montreal, arXiv:2510.04871)
<10ms — target TRM validation latency per causal graph
~86K — quality-gated training examples per month from ASTRA
8 principles / 4 domains — Horn-clause safety constitution (Young 2026, arXiv:2501.15446)

v4.0 — Champagnat n-Morphic Framework ✅ Implemented

ATLAS v4.0 implements the Champagnat n-Morphic Framework (Issue #6), grounded in Champagnat-Méléard 2011 (PTRF) and Baar-Bovier-Champagnat 2017 (AAP). All Tier 1 (Sprint 1+2) proposals are live as of v4.0.0:

Module	Crate	Key idea
`InvasionFitnessScorer`	atlas-corpus	Replaces raw pheromone softmax; α_ij = ReLU(cos_sim − 0.2) — Lotka-Volterra valid (v4.0.3)
`CognitiveBranching`	atlas-astra	Detects explore_ratio plateau → bifurcates OODA
`CanonicalPheromoneUpdate`	atlas-palace	Principled decay λ = base_rate × exp(−canonical_term) — always positive, smooth (v4.0.3)
`HJConcentrationPrior`	atlas-trm	Hopf-Cole sharpening across TRM recursion steps
`PolymorphicTrainer`	atlas-corpus	k=2,3 morphs (fast/slow/creative) with competition matrix

Mathematical foundation: DeepSupervisionTrainer IS a k-Morphic Trait Substitution System (exact, not analogy). Each N_sup pass = one phenotypic morph. Champagnat Theorem 3.1 derivably explains TRM's >75% gain from deep supervision. Full theory: see research reports.

Hugging Face Model Card

ATLAS models are published to Hugging Face under the openhubresearch organization.

First release: openhubresearch/ATLAS-OLMo-3-7B-Think-v4 — OLMo-3-7B-Think run through the ATLAS v4.0.7 n-morphic framework with BF16 inference (15.4 tok/s A100-SXM4-40GB, W16A32, 562/562 tests, correct post-norm + QK-norm architecture).

---
language: en
license: apache-2.0
library_name: atlas
tags:
  - atlas
  - stigmergic-memory
  - active-inference
  - causal-inference
  - pure-rust
  - zero-dependencies
  - champagnat-morphic
  - bf16-inference
base_model: allenai/OLMo-3-0125-7B
---

Models run through ATLAS carry the full n-morphic framework: InvasionFitnessScorer (Lotka-Volterra valid competition), CanonicalPheromoneUpdate (principled adaptive decay), BarBovier2017Constraints (stability gates), CognitiveBranching (OODA bifurcation), and HJConcentrationPrior (Hopf-Cole sharpening). See atlasagi.org for model releases and the LiveDiscoveryCorpus dataset.

ATLAS Observatory

The ATLAS Observatory is an interactive web demo showcasing the full ATLAS stack — memory palace visualization, live LLM inference, n-morphic evolution, mathematical foundations, and MCP tool playground.

Tab	What it does
🏛️ Palace	3D force-directed graph of the memory palace with pheromone flow particles, bloom lighting, and semantic fly-to navigation
⚒️ Forge	Live chat with OLMo-3-7B-Think via SSE streaming, token confidence visualization, real-time OODA loop display
⚔️ Arena	k=1, 2, 4 morphic population competition with branching events, fitness landscapes, and +38% diversity measurement
📚 Library	Interactive K↔L↔1/ρ sliders, λ decay charts, Fleming–Viot diagrams, and the full crate dependency tree
🔧 Workshop	12 MCP tool cards with live execution, tree/graph result viewers, operation log, and local-first architecture

Tech stack: 13,659 lines · Three.js + 3d-force-graph · OLMo-3-7B-Think (14GB BF16) · Palace REST API · GPU-accelerated on A10G/L4.

🔭 Try it: huggingface.co/spaces/openhubresearch/ATLAS · Website: atlasagi.org

License

Code (crates/, kernels/, scripts/): Apache 2.0
Documentation, paper, figures, datasets: CC BY 4.0

See NOTICE for attribution to incorporated components.

Citation

@software{atlas2026,
  title       = {ATLAS: Active-inference Training with Learned Adaptive Stigmergy},
  author      = {Robin Dey},
  year        = {2026},
  institution = {OpenHub Research, Thailand},
  url         = {https://github.com/web3guru888/ATLAS},
  note        = {Pure Rust LLM training framework. Zero external dependencies.
                 v4.0.7: 21 crates, 562 tests, Champagnat n-morphic framework,
                 BF16 GPU inference — OLMo-3-7B-Think 15.4 tok/s on A100-SXM4-40GB (W16A32).
                 Post-norm + QK-norm architecture for OLMo-2/3 family.
                 Full sampling pipeline: repetition penalty, top-p, top-k, min-p, freq/pres penalty.}
}

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github/workflows		.github/workflows
crates		crates
docs/dashboard		docs/dashboard
kernels		kernels
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CHARTER.md		CHARTER.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DEPLOY_REPORT.md		DEPLOY_REPORT.md
LICENSE		LICENSE
LICENSE-DOCS		LICENSE-DOCS
NOTICE		NOTICE
README.md		README.md
REPRODUCIBILITY.md		REPRODUCIBILITY.md
atlas-corpus.json		atlas-corpus.json

Folders and files

Latest commit

History

Repository files navigation

ATLAS

What is ATLAS?

The Big Idea

Pure Rust — Zero Dependencies

Seven Pillars

GPU Inference

A100-SXM4-40GB Benchmark (sm_80, CUDA 13.0)

CUDA Kernel Suite

atlas-api — OpenAI-Compatible Endpoint

Endpoints

Usage Examples

DeepSupervisionTrainer

Horn-Clause Safety Constitution

PalaceBackend Trait

Build Order (7 Stages, ~22 Weeks)

Architecture Diagrams

Paper Strategy

Getting Started

Status — v4.0.7

What Works

Version History

Crate Status

Quick Start

atlas-mcp — Model Context Protocol Server

Benchmarks

Key Numbers

v4.0 — Champagnat n-Morphic Framework ✅ Implemented

Hugging Face Model Card

ATLAS Observatory

License

Citation

About

Topics

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages