From e5c191b98d45c1a9c071599496d40f68f06a2979 Mon Sep 17 00:00:00 2001 From: Elio Neto Date: Tue, 26 May 2026 16:21:52 -0300 Subject: [PATCH] docs: comprehensive README update with security, testing, and TeamCode info - Add security features section (TLS, auth, CSRF, audit, rate limiting) - Update architecture diagram with TLS, quarantine, testing layers - Update MemTable from BTreeMap to DashMap concurrent - Add configuration table with key environment variables - Add testing section with proptest, fuzz, chaos commands - Add PR validation pipeline details - Add TeamCode production usage section - Update roadmap with completed items - Add REST API examples table - Restructure and modernize all sections --- README.md | 238 +++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 175 insertions(+), 63 deletions(-) diff --git a/README.md b/README.md index 1b9bcd7..740b71e 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ -

- ApexStore Logo +

+ ApexStore Logo

ApexStore

@@ -27,6 +27,8 @@ ApexStore is a modern, Rust-based storage engine designed for write-heavy worklo Built from the ground up using **SOLID principles**, it provides a production-grade storage solution that is easy to reason about, test, and maintain, while delivering the performance expected from a systems-level language. +> **๐Ÿš€ Used in production by [TeamCode](https://github.com/ElioNeto/teamcode)** โ€” an autonomous AI coding agent platform that relies on ApexStore for reliable, low-latency key-value storage. + ## โš–๏ธ Why ApexStore? While industry giants like RocksDB or LevelDB focus on extreme complexity, ApexStore offers: @@ -34,7 +36,8 @@ While industry giants like RocksDB or LevelDB focus on extreme complexity, ApexS - **Educational Clarity**: A clean, modular implementation of LSM-Tree that serves as a blueprint for high-performance systems. - **Strict SOLID Compliance**: Leveraging Rust's ownership model to enforce clear boundaries between MemTable, WAL, and SSTable layers. - **Observability First**: Built-in real-time metrics for memory, disk usage, and WAL health. -- **Modern Defaults**: Native LZ4 compression, Bloom Filters, and 35+ tunable parameters via environment variables. +- **Modern Defaults**: Native LZ4 compression, Bloom Filters, encryption-at-rest (AES-GCM), and 45+ tunable parameters via environment variables. +- **Security Hardened**: TLS/HTTPS support, CORS enforcement, rate limiting, per-IP connection limits, audit logging, and CSRF protection. ## ๐Ÿ“Š Performance Benchmarks @@ -49,40 +52,6 @@ While industry giants like RocksDB or LevelDB focus on extreme complexity, ApexS - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### ๐Ÿ“‹ YCSB Mixed Workload โ€” `mixed_bench` *Measured on **Intel Core i5-9300H @ 2.40GHz**, 16 GB DDR4 2667 MHz, HDD SATA 1TB (v2.1.39) โ€” `cargo bench --bench mixed_bench -- --sample-size 10`* @@ -121,17 +90,39 @@ While industry giants like RocksDB or LevelDB focus on extreme complexity, ApexS ## โœจ Key Features ### ๐Ÿ› ๏ธ Storage Engine -- **MemTable**: In-memory BTreeMap with configurable size limits. -- **Write-Ahead Log (WAL)**: ACID-compliant durability with configurable sync modes. -- **SSTable V2**: Block-based storage with Sparse Indexing and LZ4 Compression. +- **MemTable**: Concurrent `DashMap`-backed in-memory store โ€” lock-free reads and writes to different keys. +- **Write-Ahead Log (WAL)**: ACID-compliant durability with configurable sync interval and group commit support. +- **SSTable V2**: Block-based storage with Sparse Indexing, LZ4 Compression, and AES-GCM encryption. - **Bloom Filters**: Drastically reduces unnecessary disk I/O for read operations. -- **Crash Recovery**: Automatic WAL replay on startup to ensure zero data loss. +- **Crash Recovery**: Automatic WAL replay on startup + SSTable auto-repair (truncated files detected and quarantined). +- **Encryption at Rest**: AES-256-GCM encryption enabled by default with configurable keys. +- **Range Deletion**: Efficient range tombstone support with compaction-aware filtering. + +### ๐Ÿ” Security +- **TLS/HTTPS**: Built-in rustls-based HTTPS with configurable certificates and ports. +- **Authentication**: Bearer token-based auth enabled by default. +- **CORS**: Configurable cross-origin resource sharing with restrictive defaults. +- **Rate Limiting**: Sharded per-endpoint rate limiting with per-IP connection limits. +- **Audit Logging**: Structured audit events with principal tracking for every API operation. +- **CSRF Protection**: Content-Type guard middleware for state-changing requests. +- **Secrets Management**: Constant-time token comparison to prevent timing attacks. +- **File Permissions**: Data files created with 0600/0700 permissions (owner-only access). ### ๐Ÿ”Œ Access Patterns -- **Interactive CLI**: REPL interface for development and debugging. -- **REST API**: Full HTTP API with JSON payloads for microservices. -- **Batch Operations**: Efficient bulk inserts and updates. -- **Search Capabilities**: Prefix and substring search (Optimized iterators coming in v2.0). +- **Interactive CLI**: REPL interface with token management (`token create`, `token list`, `token revoke`). +- **REST API**: Full HTTP API with JSON payloads, batch operations, and paginated scans. +- **Admin Dashboard**: Real-time web dashboard with live metrics and auto-refresh via fetch(). +- **WebSocket Sync**: Real-time bidirectional sync with authentication. +- **GraphQL API**: Playground with production guard (disabled when auth enabled). +- **Change Data Capture (CDC)**: HTTP webhook delivery with configurable retry, auth, and timeout. + +### ๐Ÿ”ฌ Testing Infrastructure +- **Unit Tests**: 550+ unit tests covering all engine operations. +- **Property-Based Tests**: `proptest` for engine invariants (put/get/delete roundtrip, multi-key independence). +- **Fuzz Testing**: `cargo-fuzz` targets for WAL frame format and SSTable block decoding. +- **Chaos Testing**: Fault injection tests for I/O failures, corruption handling, and crash recovery. +- **Randomized Testing**: Competitive test with reference `HashMap` model for linearizability verification. +- **Integration Tests**: SSTable roundtrip, CLI pagination, restart recovery, stress simulation. ## ๐Ÿ—๏ธ Architecture @@ -142,44 +133,77 @@ graph TB subgraph "Interface Layer" CLI[CLI / REPL] API[REST API Server] + WS[WebSocket Sync] + end + + subgraph "Security Layer" + TLS[TLS/HTTPS] + Auth[Bearer Auth] + RateLimit[Rate Limiter] + Audit[Audit Log] + CORS[CORS Middleware] end subgraph "Core Domain" Engine[LSM Engine] - MemTable[MemTable
BTreeMap] + MemTable[MemTable
DashMap Concurrent] LogRecord[LogRecord
Data Model] + Compaction[Compaction
Strategy] end subgraph "Storage Layer" WAL[Write-Ahead Log
Durability] SST[SSTable Manager
V2 Format] - Builder[SSTable Builder
Compression] + Builder[SSTable Builder
LZ4 + AES-GCM] + Quarantine[Quarantine
Auto-Repair] end subgraph "Infrastructure" Codec[Serialization
Bincode] + Metrics[Prometheus Metrics] Error[Error Handling] Config[Configuration
Environment] + Degradation[Degradation
Manager] + end + + subgraph "Testing" + Proptest[Property Tests] + Fuzz[Fuzz Testing] + Chaos[Chaos Testing] end - CLI --> Engine - API --> Engine + CLI --> Auth --> Engine + API --> TLS --> Auth --> Engine + WS --> Auth --> Engine Engine --> WAL Engine --> MemTable MemTable -->|Flush| Builder Builder --> SST Engine -->|Read| MemTable Engine -->|Read| SST + SST -->|Corrupt| Quarantine WAL -.->|Recovery| MemTable + Engine --> Compaction + Engine --> Degradation Engine --> Config + Engine --> Metrics SST --> Codec Builder --> Codec WAL --> Codec + API --> RateLimit + API --> Audit + API --> CORS + style Engine fill:#f9a,stroke:#333,stroke-width:3px style WAL fill:#9cf,stroke:#333,stroke-width:2px style SST fill:#9cf,stroke:#333,stroke-width:2px + style TLS fill:#6c6,stroke:#333,stroke-width:2px + style Quarantine fill:#fc6,stroke:#333,stroke-width:2px + style Proptest fill:#cfc,stroke:#333,stroke-width:1px + style Fuzz fill:#cfc,stroke:#333,stroke-width:1px + style Chaos fill:#cfc,stroke:#333,stroke-width:1px ``` ## ๐Ÿš€ Quick Start @@ -201,6 +225,15 @@ cargo run --release # > stats ``` +### Server with TLS +```bash +# Generate self-signed certificates +openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes + +# Start HTTPS server +TLS_ENABLED=true TLS_CERT_PATH=cert.pem TLS_KEY_PATH=key.pem cargo run --release --bin server +``` + ## ๐Ÿณ Docker Deployment Run ApexStore as a standalone API server: @@ -214,7 +247,11 @@ docker run -d \ --name apexstore-server \ -p 8080:8080 \ -e MEMTABLE_MAX_SIZE=33554432 \ + -e TLS_ENABLED=true \ + -e TLS_CERT_PATH=/certs/cert.pem \ + -e TLS_KEY_PATH=/certs/key.pem \ -v apexstore-data:/data \ + -v ./certs:/certs:ro \ elioneto/apexstore:latest ``` @@ -224,29 +261,70 @@ docker run -d \ |--------|----------|-------------| | `POST` | `/keys` | Insert/Update: `{"key": "k1", "value": "v1"}` | | `GET` | `/keys/{key}` | Retrieve value | -| `GET` | `/stats/all` | Full telemetry (Memory, Disk, WAL) | +| `GET` | `/health/check` | Comprehensive health (uptime, engine mode, memtable stats) | +| `GET` | `/stats` | Engine telemetry (memory, disk, WAL, write/read amplification) | +| `DELETE` | `/keys/{key}` | Delete a key | +| `POST` | `/keys/batch` | Batch insert/update | +| `POST` | `/admin/flush` | Force memtable flush | +| `POST` | `/admin/compact` | Force compaction | + +## ๐Ÿ”ง Configuration + +ApexStore is configured via environment variables: + +| Variable | Default | Description | +|----------|---------|-------------| +| `TLS_ENABLED` | `false` | Enable HTTPS | +| `TLS_CERT_PATH` | โ€” | Path to TLS certificate (PEM) | +| `TLS_KEY_PATH` | โ€” | Path to TLS private key (PEM) | +| `TLS_PORT` | `443` | HTTPS port | +| `AUTH_ENABLED` | `true` | Enable bearer token authentication | +| `CORS_ENABLED` | `false` | Enable CORS middleware | +| `RATE_LIMIT_ENABLED` | `true` | Enable rate limiting | +| `MAX_CONNECTIONS_PER_IP` | `100` | Max concurrent connections per IP | +| `ENCRYPTION_ENABLED` | `true` | Enable data encryption at rest | +| `WAL_SYNC_INTERVAL` | `4` | WAL fsync interval (writes between syncs) | + +See [docs/CONFIGURATION.md](docs/CONFIGURATION.md) for a complete list. ## ๐Ÿ“ Project Structure ``` ApexStore/ โ”œโ”€โ”€ src/ -โ”‚ โ”œโ”€โ”€ core/ # LSM Engine, MemTable, Domain logic -โ”‚ โ”œโ”€โ”€ storage/ # WAL, SSTable V2, Block Builder -โ”‚ โ”œโ”€โ”€ infra/ # Codec, Error Handling, Config -โ”‚ โ”œโ”€โ”€ api/ # Actix-Web Server & Handlers -โ”‚ โ””โ”€โ”€ cli/ # REPL Implementation +โ”‚ โ”œโ”€โ”€ core/ # LSM Engine, MemTable (DashMap), Compaction, Domain logic +โ”‚ โ”œโ”€โ”€ storage/ # WAL, SSTable V2, Block Builder, Encryption, Prefix Compression +โ”‚ โ”œโ”€โ”€ infra/ # Codec, Error Handling, Config, Metrics, Scrubber, CDC +โ”‚ โ”œโ”€โ”€ api/ # Actix-Web Server, Auth, Rate Limiter, Audit, Health, CORS +โ”‚ โ””โ”€โ”€ cli/ # REPL + Token management commands โ”œโ”€โ”€ docs/ # Detailed documentation & Architecture -โ”œโ”€โ”€ tests/ # Integration test suite +โ”œโ”€โ”€ tests/ # Integration, Chaos, Proptest, Fuzz test suites +โ”œโ”€โ”€ fuzz/ # cargo-fuzz targets for WAL and SSTable โ””โ”€โ”€ Dockerfile # Multi-stage build ``` ## ๐Ÿงช Testing & Quality ```bash -cargo test # Run all tests -cargo clippy -- -D warnings # Linting -cargo fmt # Formatting +# All tests (unit + integration + proptest + chaos) +cargo test --all-features + +# Property-based tests (engine invariants) +cargo test proptest --all-features + +# Chaos tests (fault tolerance) +cargo test chaos_ --all-features + +# Fuzz testing (requires nightly) +cargo +nightly fuzz run wal -- -runs=10000 +cargo +nightly fuzz run sstable -- -runs=10000 + +# Linting & formatting +cargo clippy --all-targets --all-features -- -D warnings +cargo fmt --all -- --check + +# Security audit +cargo audit ``` ## ๐Ÿš€ CI/CD & Development Workflow @@ -261,11 +339,23 @@ graph LR D --> E[v2.1.X] ``` +### PR Validation Pipeline + +| Stage | What it checks | +|-------|----------------| +| `Rustfmt` | Code formatting | +| `Clippy` | Lint warnings (deny by default) | +| `Build and Docs` | Compilation + documentation generation | +| `Run Tests` | Full test suite (550+ tests) | +| `Security Audit` | `cargo audit` for dependency vulnerabilities | +| `Benchmarks` | Performance regression gates (Write, Read, Scan, Mixed, Stress) | +| `report-status` | Summary with root cause analysis on failure | + ### Development Flow 1. **Create feature branch** from `main` -2. **Open PR** โ†’ CI runs `cargo fmt`, `clippy`, `test`, `build` -3. **Merge PR** โ†’ Auto-increments version in `Cargo.toml`, creates tag & GitHub release +2. **Open PR** โ†’ CI runs all stages above +3. **Merge PR** โ†’ Auto-increments version, creates tag & GitHub release ๐Ÿ“– **Read:** [`MIGRATION_GUIDE.md`](MIGRATION_GUIDE.md) for team workflow ๐Ÿ“‚ **Details:** [`.github/workflows/README.md`](.github/workflows/README.md) @@ -276,9 +366,13 @@ graph LR - [x] REST API & Feature Flags - [x] Global Block Cache - [x] Trunk-based CI/CD with auto-release -- [ ] **v2.2**: Storage iterators for range queries -- [ ] **v2.3**: Concurrent read optimization +- [x] **v2.2**: Concurrent read optimization (RwLock engine core) +- [x] **v2.3**: WAL I/O decoupling, DashMap MemTable +- [x] Security audit resolution (40+ issues): TLS, encryption, auth, rate limiting, CSRF, audit +- [x] Testing infrastructure: proptest, fuzz, chaos testing - [ ] **v3.0**: Leveled/Tiered Compaction Strategies +- [ ] **v3.1**: Distributed replication & consensus +- [ ] **v3.2**: SQL query layer via Apache DataFusion ## ๐Ÿค Contributing @@ -295,6 +389,24 @@ Contributions are what make the open-source community an amazing place! Please c Distributed under the MIT License. See `LICENSE` for more information. +## ๐Ÿ’ผ Powered By + + + + + + +
+ + TeamCode Logo
+ TeamCode +
+
+ TeamCode โ€” an autonomous AI coding agent platform โ€” + uses ApexStore as its embedded storage engine for reliable, low-latency key-value storage. + Managing task state, context, and agent coordination data at scale. +
+ ## ๐Ÿ“ง Contact **Elio Neto** - [GitHub](https://github.com/ElioNeto) - netoo.elio@hotmail.com