Skip to content

lzwjava/zz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZZ

Dataset processing and training utilities for machine learning projects.

Setup

pip install -r requirements.txt

Directory Structure

scripts/
  download/   # Dataset download scripts
  extract/    # Data extraction scripts
  analysis/   # Training analysis and evaluation
logs/         # Training logs and outputs

Usage

Download Datasets

# Download FineWeb dataset
python scripts/download/download_fineweb.py --limit 1000 --output output.txt

# Download with wget scripts
bash scripts/download/wget_fineweb_1.sh

Extract Data

# Extract from parquet files
python scripts/extract/extract_parquet.py

# Extract FineWeb data
python scripts/extract/extract_fineweb.py

Analysis

# Calculate training duration
python scripts/analysis/calculate_duration.py

# Evaluate training metrics
python scripts/analysis/evaluate.py --file logs/train_log_openweb.txt

About

zz: Dataset processing and training utilities for machine learning projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors