GitHub - rscarson/polyfit: Easy Polynomial Fitting for Rust

Easy Polynomial Fitting for Rust

Because you don't need to be able to build a powerdrill to use one safely

Statistics is hard, and linear regression is made entirely out of footguns; Curve fitting might be simple in theory, but there sure is a LOT of theory.

This library is designed for developers who need to make use of the plotting or predictive powers of a curve fit without needing to worry about Huber loss, D'Agostino, or what on earth a kurtosis is.

Built for developers, not statisticians

The crate is designed to be easy to use for developers who need to make use of curve fitting without needing to understand all the underlying statistics.

Sensible defaults are provided for all parameters
The most common use-cases are covered by simple functions and macros
All documentation includes examples, explanations of parameters, and assumes zero prior knowledge of statistics
API is designed to guide towards best practices, while being flexible enough to allow advanced users to customize behavior as needed
Includes a suite of testing tools to bridge the gap between statistical rigour and engineering pragmatism

Features & Capabilities

Polynomial Fitting

[ More Information ] [ Example ]

As a curve fitting library, Polyfit does curve fitting. What's more interesting is that I do it a way that avoids most common footguns to make it intuitive and safe even with //! zero knowledge of statistics.

Functions and fits generic over floating point type
Human readable polynomial equation display
Cross-validation based scoring for very noisy data
Robust degree selection strategies that avoid overfitting
Built in outlier detection strategies
Lots of metrics, examples, and explanations to help you understand your data

Human Readable Equation Display

The crate includes a set of tools to help you display polynomials in a human readable format

See the display module for more details
All built-in basis options implement display::PolynomialDisplay to give you a nicely formatted equation like:
y(x) = 1.81x³ + 26.87x² - 1.00e3x + 9.03e3

11 Choices of Polynomial Basis [ How do I Choose ] [ How Do I Check I Chose Well ]

A polynomial basis is a type of mathematical function used to represent polynomials.

The crate includes a variety of polynomial bases to choose from, each with their own strengths and weaknesses. Polynomial and CurveFit are generic over the basis, so you can easily swap them out to see which works best for your data.

The choice of polynomial basis has massive implications for the shape of the curve, the numerical stability of the fit, and how well it fits!

Use the links above to help you choose the best basis for your data
This table gives hints at which basis to choose based on the characteristics of your data:

I also include basis_select!, a macro that will help you choose the best basis for your data.

It does an automatic fit for each basis I support, and scores them using the method of your choice.
It will show and plot out the best 3
Use it a few times will real data and see which basis seems to consistently come out on top for your use-case

Basis Name	Handles Curves Well	Repeating Patterns	Extremes / Outliers	Growth/Decay	Best Data Shape / Domain
Monomial	Poor	No	No	Poor	Any simple trend
Chebyshev	Good	No	No	Poor	Smooth curves, bounded
Legendre	Fair	No	No	Poor	Smooth curves, bounded
Hermite	Good	No	Yes	Yes	Bell-shaped, any range
Laguerre	Good	No	Yes	Yes	Decaying, positive-only
Fourier	Fair	Yes	No	Poor	Periodic signals
Logarithmic	Fair	No	Yes	Yes	Logarithmic growth

Plotting & Data Visualization

[ More Information ] [ Example ]

After I built this library I realized that debugging curve fits is a pain without visualization. So I added some plotting utilities, and made my whole test suite generate //! plots on failure.

Use the plot! macro to create plots from data and fits.

You can plot data, functions, and multiple things at once, and theres a lot of customization available

Symbolic Calculus

[ More Information ] [ Example ]

Polyfit supports indefinite integration and differentiation in many bases, as well as root finding for all bases:

Note: Exact root finding is a faster method only available for some bases

Basis	Exact Root Finding	Derivative	Integral (Indefinite)	As Monomial
Monomial	Yes	Yes	Yes	Yes
Chebyshev (1st form)	Yes	Yes	No	Yes
Chebyshev (2nd form)	Yes	Yes	Yes	No
Chebyshev (3rd form)	Yes	No	Yes	No
Legendre	No	Yes	No	Yes
Laguerre	No	Yes	No	Yes
Hermite (Both kinds)	No	Yes	No	Yes
Fourier (sin/cos)	No	Yes	Yes	No
Exponential (e^{λx})	No	Yes	Yes	No
Logarithmic (ln^n x)	No	No	No	No

Monomial Conversions for Calculus Most bases without calculus support implement IntoMonomialBasis, which allows them to be converted into a monomial for calculus operations, or for the recognizable //! formula (for example y(x) = 3x³ + 2x² + 1).

The exception is logarithmic series, which cannot be converted into monomials; For those, use Polynomial::project, which can be a good way to approximate over certain //! ranges.

Derivatives for Trend Analysis The derivative describes the rate of change at specific points in a dataset; For example the derivative of position with respect to time is velocity, and the derivative of //! velocity with respect to time is acceleration.

Find local minimums and maximums
Discover if a function changes direction
Analyze rates of change in your data

Integrals for Total Values The integral represents the accumulation of quantities, such as total distance traveled or total growth over time.

Calculate total accumulated values
Calculate area under curves

Analysis & Mathematics

[ More Information ] [ Example ]

A suite of statistical tests to validate fit quality - statistics module
- r², residual normality, homoscedasticity, autocorrelation, and more
- Comprehensive documentation on what all those words mean and when to use each test
Orthogonal-basis-specific tools:
- Spectral energy filtering for advanced smoothing and de-noising (Polynomial::spectral_energy_filter)
  - Smoothness metric for regularization and model selection (Polynomial::smoothness)
- Coefficient energy breakdown for understanding curve shape (Polynomial::coefficient_energies)
  - Orthogonal projection for converting functions between bases (Polynomial::project_orthogonal)

Testing & Validation

[ More Information ] [ Example ]

Being able to use a fitting library without stats knowledge isn't very useful if you can't test and validate your fits.

To that end I built in a testing framework to make it easy to validate your fits and make sure everything is working as expected.

Generate synthetic datasets with configurable noise and transforms
Macros that generate a plot of your fit and data on test failure for easy debugging
Built-in assertions for common fit properties and more complex statistical tests
Sane auto-defaults so you don't need to guess parameters
Seed management! Any random transforms log seeds used so you can reproduce failures easily
I even added transforms::SeedSource::from_seeds to replay tests with the same random data every time after a failure

Data Manipulation

[ More Information ] [ Example ] [ Another Example ]

I built in an optional transforms feature with 3 different classes of transforms:

Normalization Transforms

Choose from Domain, Clipping, Mean Subtraction, or Z-score normalization to preprocess your data before fitting.

Great for improving numerical stability and fit quality, especially for high degree polynomials or large ranges of data.

Noise Transforms

Generate synthetic noisy datasets by adding one of several types of configurable noise

Each option has a plot showing its characteristics and the effect of each parameter.

Scaling Transforms

Transform data by applying scaling functions. Can improve fit quality or be used for data conversions

Shift
Linear Scale
Quadratic Scale
Cubic Scale
Exponential Scale
Logarithmic Scale
Or polynomial scale where you just give a function or fit to apply

Crate Features

The parts of the library that require extra dependencies are behind feature flags, so you can keep your binary size down if you don't need them:

`plotting`

Active by default
All testing macros in the test module will generate plots on failure
Enables the plotting module for generating plots of fits and polynomials
Includes the plot! macro for easy plotting
Includes the plotting::Plot type for more customized plots

`transforms`

Active by default
Enables the transforms module for data transformation tools
- Add noise to data (Gaussian, Poisson, Impulse, Correlated, Salt & Pepper)
- Scale data (Shift, Linear, Quadratic, Cubic)
- Normalize data (Domain, Clip, Mean, zscore)

`parallel`

Inactive by default
Enables parallel processing using rayon
- CurveFit::new_auto uses parallel processing to speed up fitting when this feature is enabled
- All curve fits can use a faster solver for large enough datasets when this feature is enabled
- May significantly speed up fitting for very large datasets on multi-core systems
- Don't worry your data is tested to make sure the data set is stable enough to benefit from parallelization before using it

Basic Example

The simplest use-case is to find a mathematical function to help approximate a set of data:

//
// Load data from a file
let data = include_str!("../examples/sample_data.json");
let data: Vec<(f64, f64)> = serde_json::from_str(data).unwrap();

//
// Now we can create a curve fit to this data
//
// `ChebyshevFit` is a type alias for `CurveFit<ChebyshevBasis>`
// `ChebyshevBasis` is a robust choice for many types of data; It's more numerically stable and performs better at higher degrees
// It is one of several bases available, each with their own strengths and weaknesses. `basis_select!` can help you choose the best one for your data
let fit = ChebyshevFit::new_auto(
    &data,                 // The data to fit to
    DegreeBound::Relaxed,  // How picky we are about the degree of the polynomial (See [`statistics::DegreeBound`])
    &Aic                   // The method used to score the fit quality (See [`crate::score`])
)?;

//
// Of course if we don't test our code, how do we know it works - please don't assume I remembered all the edge cases
// If these assertion fail, a plot will be generated showing you exactly what went wrong!
//

// Here we test using r_squared, which is a measure of how well the curve explains how wiggly the data is
// An r_squared of 1.0 is a perfect fit
assert_r_squared!(fit);

// Here we check that the residuals (the difference between the data and the fit) are normally distributed
// Think of it like making sure the errors are random, not based on some undiscovered pattern
assert_residuals_normal!(fit);

Performance

The crate is designed to be fast and efficient, and includes benchmarks to help test that performance is linear with respect to the number of data points and polynomial degree.

A 3rd degree fit (1,000 points, Chebyshev basis) takes about 23µs in my benchmarks. Going up to 100,000 goes to about 4ms, which is roughly 100x the time for 1,000 points, as expected.

With parallelization turned on, a 100 million point fit for a 3rd degree Chebyshev took about 1.18s on my machine (8 cores @ 2.2GHz, 32GB RAM).

The same linear scaling can be seen with polynomial degree (1,000 points, Chebyshev basis); 11µs for degree 1, up to 44µs for degree 5.

Auto-fit is also reasonably fast; 1,000 points, Chebyshev basis, and 9 candidate degrees takes about 330µs, or 600µs with parallelization disabled.

There are also performance differences between bases; 1 - Chebyshev is the fastest, due to the stability of the matrix and the recurrence I use (~24µs for degree 3, 1,000 points) 2 - Hermite, Legendre, and Laguerre are fairly close to that (~30µs for degree 3, 1,000 points) 3 - Monomials perform worse than more stable bases (~53µs for degree 3, 1,000 points) 4 - Fourier and Logarithmic are around the same due to the trigonometric/logarithmic calculations (~57µs for degree 3, 1,000 points)

The benchmarks actually use my library to test that the scaling is linear - which I think is a pretty cool use-case:

let linear_fit = MonomialFit::new(&data, 1)?; // Create a linear fit (degree=1)
polyfit::assert_r_squared!(linear_fit);       // Assert that the linear fit explains the data well (r² > 0.9)

// If the assertion fails, a plot will be generated showing you exactly what went wrong

Raw benchmark results:

Benchmarking fit vs n (Chebyshev, Degree=3)
fit_vs_n/n=100                  [3.3817 µs 3.4070 µs 3.4363 µs]
fit_vs_n/n=1_000                [23.791 µs 23.926 µs 24.098 µs]
fit_vs_n/n=10_000               [302.99 µs 304.40 µs 306.01 µs]
fit_vs_n/n=100_000              [4.5086 ms 4.5224 ms 4.5376 ms]
fit_vs_n/n=1_000_000            [12.471 ms 12.592 ms 12.725 ms]
fit_vs_n/n=10_000_000           [115.49 ms 116.76 ms 118.07 ms]
fit_vs_n/n=100_000_000          [1.1768 s 1.1838 s 1.1908 s]

Benchmarking fit vs degree (Chebyshev, n=1000)
fit_vs_degree/Degree=1          [11.587 µs 11.691 µs 11.802 µs]
fit_vs_degree/Degree=2          [18.109 µs 18.306 µs 18.505 µs]
fit_vs_degree/Degree=3          [24.672 µs 24.954 µs 25.269 µs]
fit_vs_degree/Degree=4          [33.074 µs 33.206 µs 33.368 µs]
fit_vs_degree/Degree=5          [44.399 µs 44.887 µs 45.401 µs]
fit_vs_degree/Degree=10         [126.07 µs 127.26 µs 128.62 µs]
fit_vs_degree/Degree=20         [420.20 µs 423.44 µs 426.93 µs]

Benchmarking fit vs basis (Degree=3, n=1000)
fit_vs_basis/Monomial           [53.513 µs 53.980 µs 54.450 µs]
fit_vs_basis/Chebyshev          [24.307 µs 24.504 µs 24.710 µs]
fit_vs_basis/Legendre           [27.577 µs 80.104 µs 80.714 µs]
fit_vs_basis/Hermite            [30.496 µs 30.872 µs 31.321 µs]
fit_vs_basis/Laguerre           [31.146 µs 31.428 µs 31.734 µs]
fit_vs_basis/Fourier            [56.421 µs 56.985 µs 57.612 µs]

Benchmarking auto fit vs basis (n=1000, Candidates=9)
auto_fit_vs_basis/Monomial      [497.33 µs 500.27 µs 503.30 µs]
auto_fit_vs_basis/Chebyshev     [327.75 µs 329.67 µs 331.76 µs]
auto_fit_vs_basis/Legendre      [1.6993 ms 1.7061 ms 1.7128 ms]
auto_fit_vs_basis/Hermite       [337.65 µs 339.90 µs 342.44 µs]
auto_fit_vs_basis/Laguerre      [428.36 µs 431.26 µs 434.09 µs]
auto_fit_vs_basis/Fourier       [710.46 µs 713.07 µs 715.85 µs]
auto_fit_vs_basis/Logarithmic   [525.36 µs 528.21 µs 531.10 µs]

For transparency I ran the same benchmarks in numpy (benches/numpy_bench.py):

They use 1,000points and Chebyshev basis for comparison. n tests are with degree 3

fit_vs_n/n=100         [465.15µs]
fit_vs_n/n=1000        [204.68µs]
fit_vs_n/n=10000       [953.67µs]
fit_vs_n/n=100000      [9847.16µs]

fit_vs_degree/degree=1 [122.67µs]
fit_vs_degree/degree=2 [158.31µs]
fit_vs_degree/degree=3 [198.84µs]
fit_vs_degree/degree=4 [243.07µs]
fit_vs_degree/degree=5 [510.57µs]

More Examples

Oh no! I have some data but I need to try and predict some other value!

//
// I don't have any real data, so I'm still going to make some up!
// `function!` is a macro that makes it easy to define polynomials for testing
// `apply_poisson_noise` is part of the `transforms` module, which provides a set of tools for manipulating data
polyfit::function!(f(x) = 2 x^2 + 3 x - 5);
let synthetic_data = f.solve_range(0.0..=100.0, 1.0).apply_poisson_noise(Strength::Absolute(0.1), None);

//
// Now we can create a curve fit to this data
// Monomials don't like high degrees, so we will use a more conservative degree bound
// `Monomials` are the simplest polynomial basis, and the one most people are familiar with, it looks like `1x^2 + 2x + 3`
let fit = MonomialFit::new_auto(&synthetic_data, DegreeBound::Conservative, &Aic).expect("Failed to create fit");

//
// Now we can make some predictions!
//

let bad_and_silly_prediction = fit.y(150.0); // This is outside the range of the data, so it is probably nonsense
                                             // Fits are usually only valid within the range of the data they were created from
                                             // Violating that is called extrapolation, which we generally want to avoid
                                             // This will return an error!

let bad_prediction_probably = fit.as_polynomial().y(150.0); // This is outside the range of the data, but you asked for it specifically        
                                                            // Unlike a CurveFit, a Polynomial is just a mathematical function - no seatbelts
                                                            // This will return a value, but it is probably nonsense

let good_prediction = fit.y(50.0); // This is within the range of the data, so it is probably reasonable
                                   // This will return Ok(value)

//
// Maybe we need to make sure our predictions are good enough
// A covariance matrix sounds terrifying. Because it is.
// Which is why I do it for you - covariance matrices measure uncertainty about your fit
// They can be used to calculate confidence intervals for predictions
// They can also be used to find outliers in your data
let covariance = fit.covariance().expect("Failed to calculate covariance");
let confidence_band = covariance.confidence_band(
    50.0,                          // Confidence band for x=50
    Confidence::P95,               // Find the range where we expect 95% of points to fall within
    Some(Tolerance::Variance(0.1)) // Tolerate some extra noise in the data (10% of variance of the data, in this case)
).unwrap(); // 95% confidence band
println!("I am 95% confident that the true value at x=50.0 is between {} and {}", confidence_band.min(), confidence_band.max());

Oh dear! I sure do wish I could find which pieces of data are outliers!

//
// I still don't have any real data, so I'm going to make some up! Again!
polyfit::function!(f(x) = 2 x^2 + 3 x - 5);
let synthetic_data = f.solve_range(0.0..=100.0, 1.0).apply_poisson_noise(Strength::Absolute(0.1), None);

//
// Let's add some outliers
// Salt and pepper noise is a simple way to do this; She's good n' spiky
// We will get nice big jumps of +/- 50 in 5% of the data points
let synthetic_data_with_outliers = synthetic_data.apply_salt_pepper_noise(0.05, Strength::Absolute(-50.0), Strength::Absolute(50.0), None);

//
// Now we can create a curve fit to this data, like before
let fit = MonomialFit::new_auto(&synthetic_data_with_outliers, DegreeBound::Conservative, &Aic).expect("Failed to create fit");

//
// Now we can find the outliers!
// These are the points that fall outside the 95% confidence interval
// This means that they are outside the range where we expect 95% of the data to fall
// The `Some(0.1)` means we tolerate some noise in the data, so we don't flag points that are just a little bit off
// The noise tolerance is a fraction of the variance of the data (10% in this case)
//
// If we had a sensor that specified a tolerance of ±5 units, we could use `Some(Tolerance::Absolute(5.0))` instead
// If we had a sensor that specified a tolerance of 10% of the reading, we could use `Some(Tolerance::Measurement(0.1))` instead
let outliers = fit.covariance().unwrap().outliers(Confidence::P95, Some(Tolerance::Variance(0.1))).unwrap();

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
benches		benches
examples		examples
src		src
.gitignore		.gitignore
CHANGELOG		CHANGELOG
CONTRIBUTING		CONTRIBUTING
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Easy Polynomial Fitting for Rust

Built for developers, not statisticians

Features & Capabilities

Polynomial Fitting

Plotting & Data Visualization

Symbolic Calculus

Analysis & Mathematics

Testing & Validation

Data Manipulation

Crate Features

`plotting`

`transforms`

`parallel`

Basic Example

Performance

More Examples

About

Uh oh!

Releases 4

Languages

License

rscarson/polyfit

Folders and files

Latest commit

History

Repository files navigation

Easy Polynomial Fitting for Rust

Built for developers, not statisticians

Features & Capabilities

Polynomial Fitting

Plotting & Data Visualization

Symbolic Calculus

Analysis & Mathematics

Testing & Validation

Data Manipulation

Crate Features

plotting

transforms

parallel

Basic Example

Performance

More Examples

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Languages

`plotting`

`transforms`

`parallel`