AstroML is a research-driven Python framework for building dynamic graph machine learning models on the Stellar Development Foundation Stellar blockchain.
It treats blockchain data as a multi-asset, time-evolving graph, enabling advanced ML research on transaction networks such as fraud detection, anomaly detection, and behavioral modeling.
AstroML provides end-to-end tooling for:
- Ledger ingestion and normalization
- Dynamic transaction graph construction
- Feature engineering for blockchain accounts
- Graph Neural Networks (GNNs)
- Self-supervised node embeddings
- Anomaly detection
- Temporal modeling
- Reproducible ML experimentation
Blockchain networks are naturally graph-structured systems:
| Blockchain Concept | Graph Representation |
|---|---|
| Accounts | Nodes |
| Transactions | Directed edges |
| Assets | Edge types |
| Time | Dynamic dimension |
Most analytics tools rely on static heuristics or SQL queries.
AstroML instead enables:
- Dynamic graph learning
- Temporal GNNs
- Representation learning
- Research-grade experimentation
AstroML is designed for:
- ML researchers
- Graph ML engineers
- Fraud detection teams
- Blockchain data scientists
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AstroML: Ingestion β Graph β Train β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββ
β Stellar β
β Ledgers β
ββββββββ¬ββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β 1. INGESTION LAYER β
β ββ Ledger backfill (Polars) β
β ββ Incremental ingestion β
β ββ State tracking (idempotent)β
β ββ PostgreSQL storage β
ββββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β 2. NORMALIZATION LAYER β
β ββ Raw Stellar schema β
β β (Ledger, Transaction, Op) β
β ββ Graph mirror layer β
β β (GraphAccount, GraphEdge) β
β ββ Composite indexes β
β (account_id, timestamp) β
ββββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β 3. GRAPH BUILDING LAYER β
β ββ Time-windowed snapshots β
β ββ Edge construction β
β ββ Node indexing β
β ββ Graph validation β
ββββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β 4. FEATURE ENGINEERING β
β ββ Transaction frequency β
β ββ Asset diversity β
β ββ Structural importance β
β β (degree, betweenness, PR) β
β ββ Feature store & versioning β
β ββ Point-in-time queries β
ββββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β 5. TRAINING LAYER β
β ββ Temporal train/test split β
β ββ Link prediction task β
β ββ Negative sampling β
β ββ PyTorch Geometric models β
β β (GCN, GraphSAGE, GAT) β
β ββ Early stopping β
ββββββββββββββββββ¬βββββββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β 6. BENCHMARKING & EVALUATION β
β ββ Reproducible configs β
β ββ Random seed tracking β
β ββ Metric computation β
β β (AUC, Precision, Recall) β
β ββ Memory profiling β
β ββ Result persistence β
ββββββββββββββββββ¬βββββββββββββββββ
β
ββββββββΌβββββββ
β Baseline β
β Results β
βββββββββββββββ
Stellar Ledger Data
β
[Ingestion Service]
ββ Fetch ledgers (1000000-1100000)
ββ Track state (.astroml_state/ingestion_state.json)
ββ Store in PostgreSQL
β
[Database Schema]
ββ Raw Layer: Ledger, Transaction, Operation, Account, Asset
ββ Graph Layer: GraphAccount, GraphEdge, GraphTransactionDetail
ββ Indexes: (account_id, timestamp) composite
β
[Graph Snapshot]
ββ Query operations by time window
ββ Create Edge objects (src, dst, timestamp, asset, amount)
ββ Build node_index mapping
ββ Validate graph (isolated nodes, self-loops, density)
β
[Feature Store]
ββ Compute node features (frequency, diversity, centrality)
ββ Compute edge features (asset type, amount, direction)
ββ Version features with metadata
ββ Store in SQLite + Parquet
β
[Temporal Split]
ββ Sort edges by timestamp
ββ Split at cutoff (80% train, 20% test)
ββ Ensure no future data leaks into training
β
[Link Prediction Task]
ββ Context window: edges before cutoff
ββ Future window: edges after cutoff
ββ Positive labels: future edges
ββ Negative sampling: random non-edges
ββ Binary classification objective
β
[Model Training]
ββ LinkPredictor(encoder + decoder)
ββ Adam optimizer with early stopping
ββ Compute AUC, Precision, Recall, F1
ββ Track training/validation losses
β
[Benchmark Results]
ββ config.json (full configuration + seed)
ββ result.json (metrics + performance)
ββ metadata.json (run_id, timestamp, linking files)
astroml/
βββ ingestion/ # Ledger ingestion & state tracking
β βββ service.py # IngestionService (incremental, idempotent)
β βββ state.py # StateStore (tracks processed ledgers)
β βββ backfill.py # Bulk ledger loading
βββ db/ # Database layer
β βββ schema.py # SQLAlchemy ORM models
β βββ session.py # Database connection management
βββ features/ # Feature engineering
β βββ feature_store.py # Enterprise feature management
β βββ graph/
β β βββ snapshot.py # Time-windowed graph construction
β βββ frequency.py # Transaction frequency features
β βββ asset_diversity.py
β βββ gnn/ # Graph neural network layers
βββ models/ # ML models
β βββ link_predictor.py
β βββ gcn.py
β βββ sage.py
β βββ deep_svdd.py
βββ tasks/ # Training tasks
β βββ link_prediction_task.py
βββ training/ # Training utilities
β βββ temporal_split.py # Prevent data leakage
β βββ train_link_prediction.py
βββ benchmarking/ # Benchmarking framework
β βββ core.py # ModelBenchmark orchestrator
β βββ config.py # Configuration management
β βββ metrics.py # Metric computation
βββ quick_start.py # Quick start pipeline
βββ cli.py # Command-line interface
# Run quick start with default settings (100 ledgers, 50 accounts, 10 epochs)
make quickstart
# Run with more data for thorough testing
make quickstart-verbose# Run quick start with default settings
python -m astroml.quick_start
# Run with custom parameters
python -m astroml.quick_start --num-ledgers 200 --num-accounts 100 --epochs 20 --seed 42# Run quick start command
python -m astroml quickstart --num-ledgers 100 --num-accounts 50 --epochs 10 --seed 42The quick start pipeline:
- Generates sample data: Creates 100 synthetic ledgers with 50 accounts and realistic transactions
- Builds transaction graph: Constructs a time-windowed graph with ~2000 edges
- Validates graph: Checks for isolated nodes, self-loops, and computes statistics
- Trains baseline model: Trains a LinkPredictor model for 10 epochs
- Saves reproducible results: Stores config, results, and metadata for reproducibility
Output:
benchmark_results/quickstart/
βββ config.json # Full configuration with random seed
βββ result.json # Training metrics and performance
βββ metadata.json # Run metadata linking config and result
Example output:
================================================================================
AstroML Quick Start: Ingestion β Graph β Train Pipeline
================================================================================
[Step 1/5] Generating sample ledger data...
Generated 100 ledgers with 50 accounts
[Step 2/5] Building transaction graph...
Built graph with 2000 edges and 50 nodes
[Step 3/5] Creating benchmark configuration...
[Step 4/5] Training baseline model...
Epoch 0: Train Loss = 0.6931, Val Loss = 0.6892
Epoch 5: Train Loss = 0.4521, Val Loss = 0.4612
Training complete. Best metrics: {'auc': 0.92, 'precision': 0.88, 'recall': 0.85}
[Step 5/5] Saving benchmark results...
Saved config to benchmark_results/quickstart/config.json
Saved result to benchmark_results/quickstart/result.json
Saved metadata to benchmark_results/quickstart/metadata.json
β Quick start completed successfully!
Results saved to: benchmark_results/quickstart
================================================================================
For the quickest setup with all dependencies, use Docker:
# Clone and navigate to repository
git clone https://github.com/Traqora/astroml.git
cd astroml
# Start with Docker
cp .env.example .env
./scripts/docker-start.sh core
# Access services
curl http://localhost:8000 # API
open http://localhost:3000 # Grafanaπ Full Docker Setup: See DOCKER.md for comprehensive documentation including:
- Docker Quick Reference - Quick commands and common tasks
- Environment Configuration - Configuration guide
- Production Deployment - Production setup
- Troubleshooting - Common issues and solutions
git clone https://github.com/Traqora/astroml.git
cd astromlpython -m venv venv
source venv/bin/activate
pip install -r requirements.txtNote: Three requirements files are available. See REQUIREMENTS.md for guidance on which to use based on your environment (GPU training, CPU-only, or minimal config-only).
A lightweight Docker Compose setup is provided to spin up PostgreSQL and Redis with persistent volumes. Simply run:
docker compose up -dThis starts only the database and cache, letting you run Python scripts and training natively on your machine. Alternatively, you can configure your own database and update config/database.yaml.
Backfill ledgers:
python -m astroml.ingestion.backfill \
--start-ledger 1000000 \
--end-ledger 1100000Create a rolling time window graph:
python -m astroml.graph.build_snapshot --window 30dCreate benchmark datasets by injecting controlled fraud structures into a clean ledger copy:
python -m astroml.ingestion.synthetic_fraud_injector \
--input data/clean_ledger.jsonl \
--output data/ledger_with_fraud.jsonl \
--summary outputs/fraud_injection_summary.json \
--sybil-clusters 3 \
--sybil-cluster-size 8 \
--wash-loops 2 \
--wash-loop-size 5The injector appends transactions tagged with synthetic_fraud=true and fraud_pattern (sybil_cluster or wash_trading_loop) for downstream benchmarking.
python -m astroml.training.train_gcn- Liquidity Monitoring for the Stellar Community Fund
- Fraud / scam detection
- Account clustering
- Transaction risk scoring
- Temporal behavior modeling
- Self-supervised embeddings
- Network anomaly detection
AstroML emphasizes:
- Reproducibility
- Modular experimentation
- Scalable ingestion
- Temporal graph learning
- Production-ready ML pipelines
- Python
- PyTorch / PyTorch Geometric
- PostgreSQL
- NetworkX / graph tooling
- Real-time streaming ingestion
- Temporal GNN models
- Contrastive learning pipelines
- Feature store
- Model benchmarking suite
- Docker deployment
Contributions are welcome!
fork β branch β commit β PRPlease open issues for bugs or feature requests.
MIT License