Skip to content

1410138/Leopard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GlobalQO

This is the artifact for paper "Towards Global Query Optimization: A Learned Framework for Unifying Logical and Physical Search Space".

GlobalQO is an end-to-end learned framework that achieves global query optimization by unifying logical and physical search spaces.

Overview

data/: Contains queries for training and testing, supporting JOB-Light, STATS, and TPC-H benchmarks.

src/: Contains the source code of GlobalQO. Main components include rewriter and optimizer.

Parameters:

Database Connection Parameters:

  • --host: The host address of the target database. Default: localhost
  • --port: The port of the target database. Default: 5432
  • --username: The username used to log into the target database. Default: "root"
  • --password: The password used to log into the target database. Default: ""
  • --db: The database name to run tests. Default: job, options: stats, tpch

Mode Parameters:

  • --mode: Running mode. Default: train, options: test
  • --training-type: Training type. Default: point, options: pair
  • --model-path: Model save/load path. Default: models/job_pairwise_model_best

Optimization Parameters:

  • --ada-factor: Whether to use calibrated scaling. Default: false
  • --timeout: Query timeout in milliseconds. Default: 300000

GlobalQO Testing

Training Mode

The following commands automatically collect query optimization data and train models.

For example, we connect to PostgreSQL database and run GlobalQO training on JOB-Light dataset:

cd src
python main.py

You can configure different parameters by modifying the config.py file:

# Database configuration
DB = "job"  # Dataset: job, stats, tpch
HOST = "localhost"
PORT = 5432
USER = "root"
PASSWORD = ""

# Running mode
MODE = "train"  # train or test

# Training configuration
TRAINING_TYPE = "pair"  # point or pair
EPOCH = 100
LR = 0.001

Testing Mode

Use trained models for query optimization inference:

# Modify MODE to "test" in config.py, then run
python main.py

Output Description

  • Logical plan exploration history is recorded in the history/rbo/ directory
  • Physical plan exploration history is recorded in the history/cbo/ directory
  • Trained models are saved in the models/ directory

Prerequisites

  1. Install Modified PostgreSQL13.1

    • Install the modified PostgreSQL13.1 from this repository's postgresql13.1.tar.gz
    • Follow the standard PostgreSQL compilation and installation process
  2. Load Datasets

  3. Database Configuration

    • Follow the same database configuration as Lero-on-PostgreSQL:
      listen_addresses = '*'
      geqo = off
      max_parallel_workers = 0
      max_parallel_workers_per_gather = 0
      shared_buffers = 4GB
      
  4. Verify Installation

    • In psql, run: SET client_min_messages = DEBUG1;
    • Execute: EXPLAIN SELECT ... on any query
    • Check for debug output like:
      DEBUG:  JOIN level: 2
      DEBUG:  JOIN rels: outer=1, inner=1
      DEBUG:  JOIN between: outer=c, inner=p
      DEBUG:  JOIN Card: 169936
      
  5. Install pg_hint_plan

  6. Install Environment

    • Install the environment by running conda env create -f env.yml

Baseline Methods

About

Leopard: A Learned Optimizer with an Agile and Robust Design

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors