This is the artifact for paper "Towards Global Query Optimization: A Learned Framework for Unifying Logical and Physical Search Space".
GlobalQO is an end-to-end learned framework that achieves global query optimization by unifying logical and physical search spaces.
data/: Contains queries for training and testing, supporting JOB-Light, STATS, and TPC-H benchmarks.
src/: Contains the source code of GlobalQO. Main components include rewriter and optimizer.
Database Connection Parameters:
--host: The host address of the target database. Default: localhost--port: The port of the target database. Default: 5432--username: The username used to log into the target database. Default: "root"--password: The password used to log into the target database. Default: ""--db: The database name to run tests. Default: job, options: stats, tpch
Mode Parameters:
--mode: Running mode. Default: train, options: test--training-type: Training type. Default: point, options: pair--model-path: Model save/load path. Default: models/job_pairwise_model_best
Optimization Parameters:
--ada-factor: Whether to use calibrated scaling. Default: false--timeout: Query timeout in milliseconds. Default: 300000
The following commands automatically collect query optimization data and train models.
For example, we connect to PostgreSQL database and run GlobalQO training on JOB-Light dataset:
cd src
python main.pyYou can configure different parameters by modifying the config.py file:
# Database configuration
DB = "job" # Dataset: job, stats, tpch
HOST = "localhost"
PORT = 5432
USER = "root"
PASSWORD = ""
# Running mode
MODE = "train" # train or test
# Training configuration
TRAINING_TYPE = "pair" # point or pair
EPOCH = 100
LR = 0.001Use trained models for query optimization inference:
# Modify MODE to "test" in config.py, then run
python main.py- Logical plan exploration history is recorded in the
history/rbo/directory - Physical plan exploration history is recorded in the
history/cbo/directory - Trained models are saved in the
models/directory
-
Install Modified PostgreSQL13.1
- Install the modified PostgreSQL13.1 from this repository's
postgresql13.1.tar.gz - Follow the standard PostgreSQL compilation and installation process
- Install the modified PostgreSQL13.1 from this repository's
-
Load Datasets
- JOB-Light: Follow instructions from join-order-benchmark, queries from End-to-End-CardEst-Benchmark
- STATS: Use the same dataset as Lero-on-PostgreSQL
- TPC-H: Download from the official TPC-H website
-
Database Configuration
- Follow the same database configuration as Lero-on-PostgreSQL:
listen_addresses = '*' geqo = off max_parallel_workers = 0 max_parallel_workers_per_gather = 0 shared_buffers = 4GB
- Follow the same database configuration as Lero-on-PostgreSQL:
-
Verify Installation
- In psql, run:
SET client_min_messages = DEBUG1; - Execute:
EXPLAIN SELECT ...on any query - Check for debug output like:
DEBUG: JOIN level: 2 DEBUG: JOIN rels: outer=1, inner=1 DEBUG: JOIN between: outer=c, inner=p DEBUG: JOIN Card: 169936
- In psql, run:
-
Install pg_hint_plan
- Install pg_hint_plan extension
-
Install Environment
- Install the environment by running
conda env create -f env.yml
- Install the environment by running
- AutoSteer: IntelLabs/Auto-Steer
- LLM-R2: DAMO-NLP-SG/LLM-R2
- Lero: AlibabaIncubator/Lero-on-PostgreSQL