-
Notifications
You must be signed in to change notification settings - Fork 0
README Roadmap and Intended PR Schedule #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
7dc2968
47fc954
787a88d
2dd2295
173cdd1
042d5d6
70cfa67
9b8c48c
6c923ab
6727a90
358e881
2ad65a9
fc2595f
788665c
698aff7
f904351
49c8a81
8529e34
3734a46
48edb29
8c2e5bc
8192f16
8a6525e
dc10590
6b893ef
a605e40
d9a63e7
c29b983
0ba580c
e080193
26a6198
cf1e17e
9ca4815
2c641dd
06f1c06
c932604
069bec2
e8c05ed
15336a7
7818e84
6596141
6254257
59fb5ce
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,3 +5,4 @@ | |
| /docs/build/ | ||
| Manifest.toml | ||
| build/ | ||
| .DS_Store | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,15 @@ | ||
| name = "RidgeRegression" | ||
| uuid = "739161c8-60e1-4c49-8f89-ff30998444b1" | ||
| authors = ["Vivak Patel <vp314@users.noreply.github.com>"] | ||
| version = "0.1.0" | ||
| authors = ["Eton Tackett <etont@icloud.com>", "Vivak Patel <vp314@users.noreply.github.com>"] | ||
|
|
||
| [deps] | ||
| CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" | ||
| DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" | ||
| Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6" | ||
|
|
||
| [compat] | ||
| CSV = "0.10.15" | ||
| DataFrames = "1.8.1" | ||
| Downloads = "1.7.0" | ||
| julia = "1.12.4" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,3 +5,180 @@ | |
| [](https://github.com/vp314/RidgeRegression.jl/actions/workflows/CI.yml?query=branch%3Amain) | ||
| [](https://codecov.io/gh/vp314/RidgeRegression.jl) | ||
| [](https://github.com/invenia/BlueStyle) | ||
|
|
||
| # Project Overview | ||
|
|
||
| This project investigates the performance of numerical algorithms for solving the ridge regression problem under varying dimension regimes and conditioning levels. | ||
|
|
||
| # Directory Structure | ||
|
|
||
| The source-code layout will be structured as follows: | ||
| ```text | ||
| . | ||
| ├── Project.toml | ||
| │ | ||
| ├── src | ||
| │ ├── RidgeRegression.jl | ||
| │ ├── units.jl | ||
| ├── treatments.jl | ||
| ├── measurements.jl | ||
| │ └── algorithms | ||
| │ ├── closed_form.jl | ||
| │ ├── gradient_descent.jl | ||
| │ ├── stochastic_gradient_descent.jl | ||
| │ └── bidiagonalization.jl | ||
| │ | ||
| ├── test | ||
| │ ├── Project.toml | ||
| │ ├── runtests.jl | ||
| │ └── src | ||
| │ ├── RidgeRegression_test.jl | ||
| │ ├── units | ||
| │ │ ├── units_dataset_tests.jl | ||
| │ │ ├── units_one_hot_encode_tests.jl | ||
| ├── units_end_to_end_tests.jl | ||
| │ │ └── units_load_csv_dataset_tests.jl | ||
| │ ├── treatments | ||
| │ │ └── treatments_test.jl | ||
| │ ├── measurements | ||
| │ │ └── measurements_test.jl | ||
| │ └── algorithms | ||
| │ ├── closed_form_test.jl | ||
| │ ├── gradient_descent_test.jl | ||
| │ ├── stochastic_gradient_descent_test.jl | ||
| │ └── bidiagonalization | ||
| │ ├── bidiagonalization_compute_givens_tests.jl | ||
| │ ├── bidiagonalization_rotate_rows_tests.jl | ||
| │ ├── bidiagonalization_rotate_cols_tests.jl | ||
| │ ├── bidiagonalization_apply_Ht_to_b_tests.jl | ||
| │ └── bidiagonalization_with_H_tests.jl | ||
| │ | ||
| └── docs | ||
| ├── make.jl | ||
| └── src | ||
| ├── design.md | ||
| ├── getting_started_guide.md | ||
| ├── experimental_pipeline.md | ||
| ├── algorithm_explanations.md | ||
| ├── dataset.md | ||
| ├── measurements.md | ||
| ├── output.md | ||
| └── index.md | ||
|
Comment on lines
+56
to
+66
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do some background research on documentation for software. Think about what makes the most sense for you. Also think about the fact that you have an experiment that this software runs. So it should clearly indicate entry points to the code and exit points (e.g., tables, figures, datasets, etc.). So your documentation should also explain this as well. For instance, "design.md" is not something you would find in standard software documentation but it is something you would find here. |
||
| ``` | ||
| # PR Schedule and Roadmap | ||
|
|
||
| ## PR 1: Experimental Design | ||
| **Expected Date:** June 19, 2026 | ||
|
|
||
| ### [WHY] | ||
| A rigorous experimental framework ensures that the ridge regression algorithms are evaluated under identical conditions, enabling fair and reproducible comparisons across methods. | ||
|
|
||
| ### [WHAT] | ||
| This PR introduces the experimental design. The primary deliverable is a design document that specifies the structure of the study, including the experimental units, treatments, blocking factors, measurements, and observations that will be used to evaluate ridge regression algorithms. | ||
|
|
||
| ### [HOW] | ||
| This PR accomplishes the experimental design by creating a formal design document in `docs/src/design.md`. The document will define the experimental units, treatments, blocking factors, measurements, and observations used throughout the project. It will specify the problem dimensions, conditioning regimes, noise levels, regularization parameters, and benchmark metrics that will be used to evaluate ridge regression algorithms. | ||
|
|
||
| ### [SO WHAT] | ||
| This PR establishes the foundation for all algorithm comparisons and ensures that experimental results are reproducible and meaningful. | ||
|
|
||
| ## PR 2: Units.jl and Corresponding Tests | ||
| **Expected Date:** June 30, 2026 | ||
|
|
||
| ### [WHY] | ||
| This project requires our experimental units to be defined in accordance with the experimental design. The experimental units need to have certain properties and be consistent and reproducible throughout the experiment. | ||
|
|
||
| ### [WHAT] | ||
| This PR introduces `units.jl` and the corresponding tests. The module will provide a framework for generating and managing experimental units that conform to the specifications established in the experimental design, including factors such as problem dimension, conditioning, among others. It will also ensure that all generated units are reproducible, consistent across experimental conditions, and validated through unit testing. | ||
|
|
||
| ### [HOW] | ||
| This PR ensures that `units.jl` and the corresponding tests satisfy the experimental design requirements through a combination of testing, code coverage, and end-to-end pipeline validation. Unit tests will verify that the generated experimental units possess certain properties specified in the design. Code coverage will be run to ensure sufficient coverage and appropriate handling of edge cases. In addition, end-to-end pipeline checks will be performed to verify that the experimental units are compatible with future algorithms and measurements/observations needed. | ||
|
|
||
| ### [SO WHAT] | ||
| This PR ensures that we have generated experimental units consistent with our design and it allows us to apply treatments (Ridge Regression Algorithms) so that we can collect measurements and observations to analyze and compare the performance of these algorithms. | ||
|
|
||
| ### [FILES and Functions] | ||
| | File | Structure / Function | Purpose | | ||
| |------|----------------------|---------| | ||
| | `src/units.jl` | `Dataset{TX<:AbstractMatrix, TY<:AbstractVector}` | Defines a dataset as an experimental unit for ridge regression experiments. Stores the design matrix `X`, response vector `y`, and dataset `name` while allowing dense or sparse matrix types. | | ||
| | `src/units.jl` | `Dataset(name::String, X::AbstractMatrix, y::AbstractVector)` | Constructs a `Dataset` object and validates that the number of rows in `X` matches the length of `y`. | | ||
| | `src/units.jl` | `one_hot_encode(Xdf::DataFrame; cols_to_encode, drop_first=true)` | Converts selected categorical columns in a feature `DataFrame` into numeric dummy variables while leaving numeric columns unchanged. | | ||
| | `src/units.jl` | `load_csv_dataset(path_or_url::String; target_col, cols_to_encode=Symbol[], name="csv_dataset")` | Loads a dataset from a local CSV file or URL, removes missing observations, separates features from the target column, applies one-hot encoding, and returns a `Dataset` object. | | ||
| | `test/src/units/units_dataset_tests.jl` | `Dataset` constructor tests | Verify that valid matrices and response vectors produce a `Dataset`, and that mismatched dimensions throw an `ArgumentError`. | | ||
| | `test/src/units/units_one_hot_encode_test.jl` | Encoding tests | Verify that categorical variables are correctly one-hot encoded, that numeric columns are preserved, and that invalid nonnumeric columns trigger appropriate errors. | | ||
| | `test/src/units/units_load_csv_dataset_tests.jl` | CSV-loading tests | Verify that CSV data can be loaded, cleaned, encoded, and converted into a valid `Dataset` object. | | ||
| | `test/src/units/units_end_to_end_tests.jl` | End-to-end dataset pipeline tests | Verify that raw tabular data can move through the full pipeline: CSV loading, preprocessing, encoding, dataset construction, and compatibility with downstream ridge regression routines. | | ||
|
|
||
| ## PR 3: Golub Kahan Bidiagonalization and Corresponding Tests | ||
| **Expected Date:** June 30, 2026 | ||
|
|
||
| ### [WHY] | ||
| This project requires efficient and stable methods for solving ridge regression problems. Direct methods are an important baseline against which iterative and stochastic approaches can be compared. Golub-Kahan bidiagonalization is a direct method that transforms the data matrix into a bidiagonal form through a series of orthogonal transformations, yielding a simpler problem that can be solved more efficiently. | ||
| ### [WHAT] | ||
| This PR introduces `bidiagonalization.jl` and the corresponding tests. The module implements Golub-Kahan bidiagonalization using a sequence of Givens rotations to get the matrix into upper bidiagonal form. The implementation includes routines for computing Givens rotation coefficients, applying orthogonal transformations to matrix rows and columns, accumulating the orthogonal matrices (H and K), and applying the resulting transformations to the constant vector. The module serves as the project's first direct method for solving ridge regression problems. | ||
|
|
||
| ### [HOW] | ||
| This PR ensures correctness and reliability through a combination of unit testing, code coverage analysis, structural validation, and end-to-end pipeline checks. Unit tests will verify the correctness of Givens rotation coefficients, row and column transformations, and bidiagonalization procedures on square and rectangular matrices. Structural tests will confirm that the computed matrices satisfy the expected properties, including orthogonality of H and K, preservation of matrix dimensions, and the relation (H^T A K = B), where B is upper bidiagonal. Code coverage analysis will be used to ensure that core computational paths and edge cases are exercised, while end-to-end tests will verify compatibility with downstream ridge regression solvers and benchmarking routines. | ||
|
|
||
| ### [SO WHAT] | ||
| This PR is the first algorithm (treatment) in the project and establishes a direct method baseline for comparison with future gradient-based and stochastic approaches. | ||
|
|
||
| ### [FILES and Functions] | ||
| | File | Structure / Function | Purpose | | ||
| |------|----------------------|---------| | ||
| | `src/algorithms/bidiagonalization.jl` | `compute_givens(...)` | Computes the cosine and sine coefficients defining a Givens rotation used to eliminate selected matrix entries. | | ||
| | `src/algorithms/bidiagonalization.jl` | `rotate_rows!(...)` | Applies a Givens rotation to two rows of a matrix during the left-transformation stage of the bidiagonalization procedure. | | ||
| | `src/algorithms/bidiagonalization.jl` | `rotate_cols!(...)` | Applies a Givens rotation to two columns of a matrix during the right-transformation stage of the bidiagonalization procedure. | | ||
| | `src/algorithms/bidiagonalization.jl` | `apply_Ht_to_b(...)` | Applies the accumulated left orthogonal transformations to the constant vector, producing the transformed right-hand side of the reduced problem. | | ||
| | `src/algorithms/bidiagonalization.jl` | `bidiagonalize_with_H(...)` | Performs Golub–Kahan Bidiagonalization using Givens rotations and accumulates the orthogonal transformations required to reduce a matrix to upper bidiagonal form. | | ||
| | `test/src/algorithms/bidiagonalization/bidiagonalization_compute_givens_test.jl` | Givens rotation tests | Verify that the computed rotation coefficients satisfy the expected numerical and trigonometric properties. | | ||
| | `test/src/algorithms/bidiagonalization/bidiagonalization_rotate_rows_test.jl` | Row rotation tests | Verify that row rotations correctly eliminate targeted entries while preserving orthogonality. | | ||
| | `test/src/algorithms/bidiagonalization/bidiagonalization_rotate_cols_test.jl` | Column rotation tests | Verify that column rotations correctly eliminate targeted entries while preserving orthogonality. | | ||
| | `test/src/algorithms/bidiagonalization/bidiagonalization_apply_Ht_to_b_test.jl` | Transformation tests | Verify that accumulated orthogonal transformations are correctly applied to the constant vector. | | ||
| | `test/src/algorithms/bidiagonalization/bidiagonalization_with_H_test.jl` | Bidiagonalization tests | Verify that the resulting matrix is upper bidiagonal and that the accumulated orthogonal matrices satisfy the expected structural properties. | | ||
|
|
||
|
|
||
| ## PR 4: Gradient Based Optimization and Corresponding Tests | ||
| **Expected Date:** July 6, 2026 | ||
|
|
||
| ### [WHY] | ||
| This project requires efficient and stable methods for solving ridge regression problems. Iterative optimization methods such as gradient descent provide an alternative approach to direct methods, which may become prohibitive or too costly. | ||
| ### [WHAT] | ||
| This PR introduces `gradient_descent.jl` and the corresponding tests. The module will implement gradient-based optimization methods for solving ridge regression problems, including routines for evaluating objective functions, computing gradients, and performing iterative updates. | ||
|
|
||
| ### [HOW] | ||
| This PR ensures correctness and reliability through a combination of unit testing, code coverage analysis, convergence validation, and end-to-end pipeline checks. | ||
|
|
||
| ### [SO WHAT] | ||
| This PR establishes the project's first iterative optimization baseline and provides a foundation for comparing direct and optimization-based approaches to ridge regression. The resulting implementation will enable experiments evaluating convergence behavior, computational efficiency, solution accuracy, and robustness across a variety of problem settings. | ||
|
|
||
| ### [FILES AND FUNCTIONS] | ||
| The exact function names may evolve, but this PR is expected to include the following core components: | ||
| | File | Structure / Function | Purpose | | ||
| |------|----------------------|---------| | ||
| | `src/algorithms/gradient_descent.jl` | `ridge_objective_evaluation(...)` | Evaluate the ridge regression objective for a given coefficient vector. | | ||
| | `src/algorithms/gradient_descent.jl` | `ridge_gradient_calculation(...)` | Compute the gradient of the ridge regression objective. | | ||
| | `src/algorithms/gradient_descent.jl` | `gradient_descent(...)` | Implement the main iterative update procedure for solving ridge regression problems. | | ||
| | `src/algorithms/gradient_descent.jl` | `stopping_criterion(...)` | Determine when the iterative method should terminate based on tolerance, maximum iterations, or convergence behavior. | | ||
| | `src/algorithms/gradient_descent.jl` | `gradient_descent_results(...)` | Store or return solution information such as coefficients, objective values, iteration count, and convergence status. | | ||
| | `test/src/algorithms/gradient_descent_test/gradient_descent_objective_gradient_test.jl` | Objective and gradient tests | Verify that objective values and gradients are computed correctly. | | ||
| | `test/src/algorithms/gradient_descent_test/gradient_descent_update_rule_test.jl` | Update rule tests | Verify that gradient descent updates move in the expected direction and reduce the objective under appropriate conditions. | | ||
| | `test/src/algorithms/gradient_descent_test/gradient_descent_convergence_test.jl` | Convergence tests | Verify that the method approaches known ridge regression solutions on small benchmark problems. | | ||
| | `test/src/algorithms/gradient_descent_test/gradient_descent_pipeline_test.jl` | End-to-end pipeline tests | Verify compatibility with experimental units, benchmarking routines, and downstream measurement collection. | | ||
|
|
||
| ## PR 5: Stochastic Optimization and Corresponding Tests | ||
| **Expected Date:** July 30, 2026 | ||
|
|
||
| ### [WHY] | ||
| This project requires solving the ridge regression problem in settings where traditional methods may become infeasible. In large-scale settings, stochastic optimization methods provide an alternative by approximating the optimization problem using random samples rather than processing the entire dataset at every iteration. These methods are particularly when the dimensions of the problem become too large. | ||
|
|
||
| ### [WHAT] | ||
| This PR introduces stochastic optimization methods and the corresponding tests. The module will implement stochastic optimization methods for solving ridge regression problems, including stochastic gradient descent and related variants. | ||
|
|
||
| ### [HOW] | ||
| This PR ensures correctness and reliability through a combination of unit testing, code coverage analysis, convergence validation, and end-to-end pipeline checks. | ||
|
|
||
| ### [SO WHAT] | ||
| This PR extends the project's ability to solve ridge regression problems beyond the regimes where direct and traditional iterative methods are practical. By introducing stochastic optimization methods, we can apply treatments to larger experimental units and collect the measurements and observations necessary to compare algorithm performance across a broader range of problem dimensions and computational settings. | ||
|
|
||
| ### [TO DO] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,7 @@ makedocs(; | |
| ), | ||
| pages=[ | ||
| "Home" => "index.md", | ||
| "Design" => "design.md", | ||
| ], | ||
| ) | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
treatments.jl
measurements.jl