Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
7dc2968
Add dataset utilities and tests
EtonT471 Mar 2, 2026
47fc954
Adding dataset_tests.jl
EtonT471 Mar 3, 2026
787a88d
Small changes to design.md
EtonT471 Mar 3, 2026
2dd2295
March 16 Updates
EtonT471 Mar 17, 2026
173cdd1
Ridge Regression file
EtonT471 Mar 17, 2026
042d5d6
dataset.jl small update
EtonT471 Mar 17, 2026
70cfa67
Updated Experimental Units and Treatments Sections
EtonT471 Mar 17, 2026
9b8c48c
Small changes
EtonT471 Mar 17, 2026
6c923ab
Bidiagonalization Stuff
EtonT471 Mar 17, 2026
6727a90
Adding Linear Algebra to Project.toml
EtonT471 Mar 17, 2026
358e881
Design Branch
EtonT471 Mar 17, 2026
2ad65a9
Updated design 3/19
EtonT471 Mar 19, 2026
fc2595f
3/19 edits
EtonT471 Mar 19, 2026
788665c
Update experimental design document
EtonT471 Mar 24, 2026
698aff7
Remove code files from design branch
EtonT471 Mar 24, 2026
f904351
Compiling issues
EtonT471 Mar 24, 2026
49c8a81
Attempt 1
EtonT471 Mar 24, 2026
8529e34
Minor adjustment
EtonT471 Mar 24, 2026
3734a46
restoring .jl files
EtonT471 Mar 24, 2026
48edb29
moving to src
EtonT471 Mar 24, 2026
8c2e5bc
Put source file back into the folder.
vp314 Mar 24, 2026
8192f16
Delete docs/src/RidgeRegression.jl
vp314 Mar 24, 2026
8a6525e
Source folder re-added
EtonT471 Mar 24, 2026
dc10590
Fixed math notation
vp314 Mar 24, 2026
6b893ef
recent changes
EtonT471 Apr 7, 2026
a605e40
Updated design.md
EtonT471 Apr 7, 2026
d9a63e7
Remove RidgeRegression.jl changes from Design branch
EtonT471 Apr 7, 2026
c29b983
fixing compiling
EtonT471 Apr 7, 2026
0ba580c
Fixing dollar signs
EtonT471 Apr 7, 2026
e080193
XtopX change
EtonT471 Apr 7, 2026
26a6198
Update 4/13
EtonT471 Apr 13, 2026
cf1e17e
4/14 Update
EtonT471 Apr 15, 2026
9ca4815
4/28 updates
EtonT471 Apr 28, 2026
2c641dd
Fixed compiling issue
EtonT471 Apr 28, 2026
06f1c06
More compiling fixed
EtonT471 Apr 28, 2026
c932604
Updates to experimental design
vp314 May 21, 2026
069bec2
6/17 update
etontackett Jun 18, 2026
e8c05ed
Night changes
etontackett Jun 19, 2026
15336a7
PR Schedule
etontackett Jun 19, 2026
7818e84
Readme Changes need to add more PRs
etontackett Jun 23, 2026
6596141
Final changes
etontackett Jun 23, 2026
6254257
Apply suggestion from @vp314
etontackett Jun 29, 2026
59fb5ce
6/30 Changes
etontackett Jun 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
/docs/build/
Manifest.toml
build/
.DS_Store
10 changes: 9 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
name = "RidgeRegression"
uuid = "739161c8-60e1-4c49-8f89-ff30998444b1"
authors = ["Vivak Patel <vp314@users.noreply.github.com>"]
version = "0.1.0"
authors = ["Eton Tackett <etont@icloud.com>", "Vivak Patel <vp314@users.noreply.github.com>"]

[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"

[compat]
CSV = "0.10.15"
DataFrames = "1.8.1"
Downloads = "1.7.0"
julia = "1.12.4"
177 changes: 177 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,180 @@
[![Build Status](https://github.com/vp314/RidgeRegression.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/vp314/RidgeRegression.jl/actions/workflows/CI.yml?query=branch%3Amain)
[![Coverage](https://codecov.io/gh/vp314/RidgeRegression.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/vp314/RidgeRegression.jl)
[![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/invenia/BlueStyle)

# Project Overview

This project investigates the performance of numerical algorithms for solving the ridge regression problem under varying dimension regimes and conditioning levels.

# Directory Structure

The source-code layout will be structured as follows:
```text
.
├── Project.toml
├── src
│ ├── RidgeRegression.jl
│ ├── units.jl

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treatments.jl
measurements.jl

├── treatments.jl
├── measurements.jl
│ └── algorithms
│ ├── closed_form.jl
│ ├── gradient_descent.jl
│ ├── stochastic_gradient_descent.jl
│ └── bidiagonalization.jl
├── test
│ ├── Project.toml
│ ├── runtests.jl
│ └── src
│ ├── RidgeRegression_test.jl
│ ├── units
│ │ ├── units_dataset_tests.jl
│ │ ├── units_one_hot_encode_tests.jl
├── units_end_to_end_tests.jl
│ │ └── units_load_csv_dataset_tests.jl
│ ├── treatments
│ │ └── treatments_test.jl
│ ├── measurements
│ │ └── measurements_test.jl
│ └── algorithms
│ ├── closed_form_test.jl
│ ├── gradient_descent_test.jl
│ ├── stochastic_gradient_descent_test.jl
│ └── bidiagonalization
│ ├── bidiagonalization_compute_givens_tests.jl
│ ├── bidiagonalization_rotate_rows_tests.jl
│ ├── bidiagonalization_rotate_cols_tests.jl
│ ├── bidiagonalization_apply_Ht_to_b_tests.jl
│ └── bidiagonalization_with_H_tests.jl
└── docs
├── make.jl
└── src
├── design.md
├── getting_started_guide.md
├── experimental_pipeline.md
├── algorithm_explanations.md
├── dataset.md
├── measurements.md
├── output.md
└── index.md
Comment on lines +56 to +66

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do some background research on documentation for software. Think about what makes the most sense for you.

Also think about the fact that you have an experiment that this software runs. So it should clearly indicate entry points to the code and exit points (e.g., tables, figures, datasets, etc.). So your documentation should also explain this as well.

For instance, "design.md" is not something you would find in standard software documentation but it is something you would find here.

```
# PR Schedule and Roadmap

## PR 1: Experimental Design
**Expected Date:** June 19, 2026

### [WHY]
A rigorous experimental framework ensures that the ridge regression algorithms are evaluated under identical conditions, enabling fair and reproducible comparisons across methods.

### [WHAT]
This PR introduces the experimental design. The primary deliverable is a design document that specifies the structure of the study, including the experimental units, treatments, blocking factors, measurements, and observations that will be used to evaluate ridge regression algorithms.

### [HOW]
This PR accomplishes the experimental design by creating a formal design document in `docs/src/design.md`. The document will define the experimental units, treatments, blocking factors, measurements, and observations used throughout the project. It will specify the problem dimensions, conditioning regimes, noise levels, regularization parameters, and benchmark metrics that will be used to evaluate ridge regression algorithms.

### [SO WHAT]
This PR establishes the foundation for all algorithm comparisons and ensures that experimental results are reproducible and meaningful.

## PR 2: Units.jl and Corresponding Tests
**Expected Date:** June 30, 2026

### [WHY]
This project requires our experimental units to be defined in accordance with the experimental design. The experimental units need to have certain properties and be consistent and reproducible throughout the experiment.

### [WHAT]
This PR introduces `units.jl` and the corresponding tests. The module will provide a framework for generating and managing experimental units that conform to the specifications established in the experimental design, including factors such as problem dimension, conditioning, among others. It will also ensure that all generated units are reproducible, consistent across experimental conditions, and validated through unit testing.

### [HOW]
This PR ensures that `units.jl` and the corresponding tests satisfy the experimental design requirements through a combination of testing, code coverage, and end-to-end pipeline validation. Unit tests will verify that the generated experimental units possess certain properties specified in the design. Code coverage will be run to ensure sufficient coverage and appropriate handling of edge cases. In addition, end-to-end pipeline checks will be performed to verify that the experimental units are compatible with future algorithms and measurements/observations needed.

### [SO WHAT]
This PR ensures that we have generated experimental units consistent with our design and it allows us to apply treatments (Ridge Regression Algorithms) so that we can collect measurements and observations to analyze and compare the performance of these algorithms.

### [FILES and Functions]
| File | Structure / Function | Purpose |
|------|----------------------|---------|
| `src/units.jl` | `Dataset{TX<:AbstractMatrix, TY<:AbstractVector}` | Defines a dataset as an experimental unit for ridge regression experiments. Stores the design matrix `X`, response vector `y`, and dataset `name` while allowing dense or sparse matrix types. |
| `src/units.jl` | `Dataset(name::String, X::AbstractMatrix, y::AbstractVector)` | Constructs a `Dataset` object and validates that the number of rows in `X` matches the length of `y`. |
| `src/units.jl` | `one_hot_encode(Xdf::DataFrame; cols_to_encode, drop_first=true)` | Converts selected categorical columns in a feature `DataFrame` into numeric dummy variables while leaving numeric columns unchanged. |
| `src/units.jl` | `load_csv_dataset(path_or_url::String; target_col, cols_to_encode=Symbol[], name="csv_dataset")` | Loads a dataset from a local CSV file or URL, removes missing observations, separates features from the target column, applies one-hot encoding, and returns a `Dataset` object. |
| `test/src/units/units_dataset_tests.jl` | `Dataset` constructor tests | Verify that valid matrices and response vectors produce a `Dataset`, and that mismatched dimensions throw an `ArgumentError`. |
| `test/src/units/units_one_hot_encode_test.jl` | Encoding tests | Verify that categorical variables are correctly one-hot encoded, that numeric columns are preserved, and that invalid nonnumeric columns trigger appropriate errors. |
| `test/src/units/units_load_csv_dataset_tests.jl` | CSV-loading tests | Verify that CSV data can be loaded, cleaned, encoded, and converted into a valid `Dataset` object. |
| `test/src/units/units_end_to_end_tests.jl` | End-to-end dataset pipeline tests | Verify that raw tabular data can move through the full pipeline: CSV loading, preprocessing, encoding, dataset construction, and compatibility with downstream ridge regression routines. |

## PR 3: Golub Kahan Bidiagonalization and Corresponding Tests
**Expected Date:** June 30, 2026

### [WHY]
This project requires efficient and stable methods for solving ridge regression problems. Direct methods are an important baseline against which iterative and stochastic approaches can be compared. Golub-Kahan bidiagonalization is a direct method that transforms the data matrix into a bidiagonal form through a series of orthogonal transformations, yielding a simpler problem that can be solved more efficiently.
### [WHAT]
This PR introduces `bidiagonalization.jl` and the corresponding tests. The module implements Golub-Kahan bidiagonalization using a sequence of Givens rotations to get the matrix into upper bidiagonal form. The implementation includes routines for computing Givens rotation coefficients, applying orthogonal transformations to matrix rows and columns, accumulating the orthogonal matrices (H and K), and applying the resulting transformations to the constant vector. The module serves as the project's first direct method for solving ridge regression problems.

### [HOW]
This PR ensures correctness and reliability through a combination of unit testing, code coverage analysis, structural validation, and end-to-end pipeline checks. Unit tests will verify the correctness of Givens rotation coefficients, row and column transformations, and bidiagonalization procedures on square and rectangular matrices. Structural tests will confirm that the computed matrices satisfy the expected properties, including orthogonality of H and K, preservation of matrix dimensions, and the relation (H^T A K = B), where B is upper bidiagonal. Code coverage analysis will be used to ensure that core computational paths and edge cases are exercised, while end-to-end tests will verify compatibility with downstream ridge regression solvers and benchmarking routines.

### [SO WHAT]
This PR is the first algorithm (treatment) in the project and establishes a direct method baseline for comparison with future gradient-based and stochastic approaches.

### [FILES and Functions]
| File | Structure / Function | Purpose |
|------|----------------------|---------|
| `src/algorithms/bidiagonalization.jl` | `compute_givens(...)` | Computes the cosine and sine coefficients defining a Givens rotation used to eliminate selected matrix entries. |
| `src/algorithms/bidiagonalization.jl` | `rotate_rows!(...)` | Applies a Givens rotation to two rows of a matrix during the left-transformation stage of the bidiagonalization procedure. |
| `src/algorithms/bidiagonalization.jl` | `rotate_cols!(...)` | Applies a Givens rotation to two columns of a matrix during the right-transformation stage of the bidiagonalization procedure. |
| `src/algorithms/bidiagonalization.jl` | `apply_Ht_to_b(...)` | Applies the accumulated left orthogonal transformations to the constant vector, producing the transformed right-hand side of the reduced problem. |
| `src/algorithms/bidiagonalization.jl` | `bidiagonalize_with_H(...)` | Performs Golub–Kahan Bidiagonalization using Givens rotations and accumulates the orthogonal transformations required to reduce a matrix to upper bidiagonal form. |
| `test/src/algorithms/bidiagonalization/bidiagonalization_compute_givens_test.jl` | Givens rotation tests | Verify that the computed rotation coefficients satisfy the expected numerical and trigonometric properties. |
| `test/src/algorithms/bidiagonalization/bidiagonalization_rotate_rows_test.jl` | Row rotation tests | Verify that row rotations correctly eliminate targeted entries while preserving orthogonality. |
| `test/src/algorithms/bidiagonalization/bidiagonalization_rotate_cols_test.jl` | Column rotation tests | Verify that column rotations correctly eliminate targeted entries while preserving orthogonality. |
| `test/src/algorithms/bidiagonalization/bidiagonalization_apply_Ht_to_b_test.jl` | Transformation tests | Verify that accumulated orthogonal transformations are correctly applied to the constant vector. |
| `test/src/algorithms/bidiagonalization/bidiagonalization_with_H_test.jl` | Bidiagonalization tests | Verify that the resulting matrix is upper bidiagonal and that the accumulated orthogonal matrices satisfy the expected structural properties. |


## PR 4: Gradient Based Optimization and Corresponding Tests
**Expected Date:** July 6, 2026

### [WHY]
This project requires efficient and stable methods for solving ridge regression problems. Iterative optimization methods such as gradient descent provide an alternative approach to direct methods, which may become prohibitive or too costly.
### [WHAT]
This PR introduces `gradient_descent.jl` and the corresponding tests. The module will implement gradient-based optimization methods for solving ridge regression problems, including routines for evaluating objective functions, computing gradients, and performing iterative updates.

### [HOW]
This PR ensures correctness and reliability through a combination of unit testing, code coverage analysis, convergence validation, and end-to-end pipeline checks.

### [SO WHAT]
This PR establishes the project's first iterative optimization baseline and provides a foundation for comparing direct and optimization-based approaches to ridge regression. The resulting implementation will enable experiments evaluating convergence behavior, computational efficiency, solution accuracy, and robustness across a variety of problem settings.

### [FILES AND FUNCTIONS]
The exact function names may evolve, but this PR is expected to include the following core components:
| File | Structure / Function | Purpose |
|------|----------------------|---------|
| `src/algorithms/gradient_descent.jl` | `ridge_objective_evaluation(...)` | Evaluate the ridge regression objective for a given coefficient vector. |
| `src/algorithms/gradient_descent.jl` | `ridge_gradient_calculation(...)` | Compute the gradient of the ridge regression objective. |
| `src/algorithms/gradient_descent.jl` | `gradient_descent(...)` | Implement the main iterative update procedure for solving ridge regression problems. |
| `src/algorithms/gradient_descent.jl` | `stopping_criterion(...)` | Determine when the iterative method should terminate based on tolerance, maximum iterations, or convergence behavior. |
| `src/algorithms/gradient_descent.jl` | `gradient_descent_results(...)` | Store or return solution information such as coefficients, objective values, iteration count, and convergence status. |
| `test/src/algorithms/gradient_descent_test/gradient_descent_objective_gradient_test.jl` | Objective and gradient tests | Verify that objective values and gradients are computed correctly. |
| `test/src/algorithms/gradient_descent_test/gradient_descent_update_rule_test.jl` | Update rule tests | Verify that gradient descent updates move in the expected direction and reduce the objective under appropriate conditions. |
| `test/src/algorithms/gradient_descent_test/gradient_descent_convergence_test.jl` | Convergence tests | Verify that the method approaches known ridge regression solutions on small benchmark problems. |
| `test/src/algorithms/gradient_descent_test/gradient_descent_pipeline_test.jl` | End-to-end pipeline tests | Verify compatibility with experimental units, benchmarking routines, and downstream measurement collection. |

## PR 5: Stochastic Optimization and Corresponding Tests
**Expected Date:** July 30, 2026

### [WHY]
This project requires solving the ridge regression problem in settings where traditional methods may become infeasible. In large-scale settings, stochastic optimization methods provide an alternative by approximating the optimization problem using random samples rather than processing the entire dataset at every iteration. These methods are particularly when the dimensions of the problem become too large.

### [WHAT]
This PR introduces stochastic optimization methods and the corresponding tests. The module will implement stochastic optimization methods for solving ridge regression problems, including stochastic gradient descent and related variants.

### [HOW]
This PR ensures correctness and reliability through a combination of unit testing, code coverage analysis, convergence validation, and end-to-end pipeline checks.

### [SO WHAT]
This PR extends the project's ability to solve ridge regression problems beyond the regimes where direct and traditional iterative methods are practical. By introducing stochastic optimization methods, we can apply treatments to larger experimental units and collect the measurements and observations necessary to compare algorithm performance across a broader range of problem dimensions and computational settings.

### [TO DO]
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ makedocs(;
),
pages=[
"Home" => "index.md",
"Design" => "design.md",
],
)

Expand Down
Loading