Skip to content

Optimize com_psf#59

Open
cgarling wants to merge 7 commits into
mainfrom
optimize_com
Open

Optimize com_psf#59
cgarling wants to merge 7 commits into
mainfrom
optimize_com

Conversation

@cgarling
Copy link
Copy Markdown
Member

@cgarling cgarling commented May 20, 2026

If you copy-paste the revised function definitions from this PR into your REPL you can run the following comparison which will show the new implementation is ~3x faster and allocation free EDIT: If you have to collect the input (as we do because of how Photometry.jl passes us the cutout) then it will be 2 allocations and adds ~100 ns to the base case speed. I chose to do com_psf(T::Type{<:AbstractFloat}, img_ap, rel_thresh) because Float64 is not really any slower on my testing so this way it makes it easier for us to switch to Float64 in the future if we every wanted to.

using PSFModels: gaussian
using BenchmarkTools
import Astroalign

const x = 1:20
const y = 1:20
const T = Float32
model(x, y, amp) = gaussian(T, 4, 4; x, y, fwhm=3) * amp
data = model.(x, y', 10) .+ T(0.1) * randn(T, length(x), length(y))
result_orig = Astroalign.com_psf(data; rel_thresh=0.1f0)
result_revised = com_psf(data; rel_thresh=0.1f0)
println("Original COM: ", (result_orig.psf_params.x, result_orig.psf_params.y))
println("Revised COM: ", (result_revised.psf_params.x, result_revised.psf_params.y))
println("Original FWHM: ", result_orig.psf_params.fwhm)
println("Revised FWHM: ", result_revised.psf_params.fwhm)
println("Original benchmark: ")
display(@benchmark Astroalign.com_psf($data; rel_thresh=$0.1f0))
println("Revised benchmark: ")
display(@benchmark com_psf($data; rel_thresh=$0.1f0))

My result:

EDIT: Because Photometry.jl will pass us an object from Transducers.jl we have to call collect, giving us 2 allocations and runtime +100 ns over not collecting (if we had a pure matrix input).

julia> println("Original COM: ", (result_orig.psf_params.x, result_orig.psf_params.y))
Original COM: (3.9974918f0, 3.9858212f0)

julia> println("Revised COM: ", (result_revised.psf_params.x, result_revised.psf_params.y))
Revised COM: (3.9974945f0, 3.9858336f0)

julia> println("Original FWHM: ", result_orig.psf_params.fwhm)
Original FWHM: (2.31581660320762, 2.3766457470426725)

julia> println("Revised FWHM: ", result_revised.psf_params.fwhm)
Revised FWHM: (2.3164616f0, 2.3772223f0)

julia> display(@benchmark Astroalign.com_psf($data; rel_thresh=$0.1f0))
BenchmarkTools.Trial: 10000 samples with 10 evaluations per sample.
 Range (min  max):  1.408 μs   1.050 ms  ┊ GC (min  max):  0.00%  99.19%
 Time  (median):     1.544 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   2.372 μs ± 15.628 μs  ┊ GC (mean ± σ):  21.39% ±  4.62%

  ▆█▇▅▃▁▁▁                ▃▄▃▂                               ▂
  █████████▇▇▆▇▇▇▇▆▇▆▆▆▇▆██████▆▆▆▅▄▅▄▅▅▃▅▆███▇▅▄▄▄▁▁▄▁▆▅▇██ █
  1.41 μs      Histogram: log(frequency) by time     5.56 μs <

 Memory estimate: 10.41 KiB, allocs estimate: 20.

julia> println("Revised benchmark: ")
Revised benchmark:

julia> @benchmark com_psf($data; rel_thresh=$0.1f0)
BenchmarkTools.Trial: 10000 samples with 183 evaluations per sample.
 Range (min  max):  571.716 ns    9.490 μs  ┊ GC (min  max): 0.00%  88.05%
 Time  (median):     600.563 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   678.928 ns ± 532.693 ns  ┊ GC (mean ± σ):  8.48% ±  9.61%

  █▅▂▁▁                                                         ▁
  ██████▇▇▆▆▄▅▁▄▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▄▃▄▄▄▄▄▄▄▃▄▅ █
  572 ns        Histogram: log(frequency) by time        4.5 μs <

 Memory estimate: 1.64 KiB, allocs estimate: 2.

@cgarling cgarling marked this pull request as draft May 20, 2026 20:28
I did not understand that `collect(img_ap)` returns a different shape than `collect(T, img_ap)`.
@cgarling cgarling marked this pull request as ready for review May 20, 2026 21:30
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.49%. Comparing base (02a35cd) to head (6fc8e45).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #59      +/-   ##
==========================================
+ Coverage   99.45%   99.49%   +0.03%     
==========================================
  Files           5        5              
  Lines         185      197      +12     
==========================================
+ Hits          184      196      +12     
  Misses          1        1              
Flag Coverage Δ
unittests 99.49% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cgarling added a commit to JuliaAstro/Photometry.jl that referenced this pull request May 23, 2026
In JuliaAstro/Astroalign.jl#59 we encounter an
issue where the Transducers.Eduction that is passed into the function
`f` keyword argument of `photometry()` is not appropriate for further
analysis as the lazy iterator does not support `axes` and other similar
methods without first being `collect`ed. This PR aims to remove the need
to call `collect` downstream by making sure that matrix-like objects are
passed into the `f` keyword argument of `photometry`.

Here I replace the Transducers.Map call with a new
`WeightedApertureCutout` struct that represents a lazy cutout. By
defining a custom struct we can ensure all methods (`getindex`, `axes`,
etc) we need can be supported.

Here's a benchmark I wrote testing the performance with CircularAperture
at a variety of radii. Currently this is a *touch* slower for small
apertures, not sure why but open to suggestions for improvements

```julia
using BenchmarkTools
using StableRNGs
using Photometry
using PrettyTables: pretty_table

function show_benchmarks(results)
    # Collect results
    sorted  = sort(collect(results), by = x -> parse(Float64, split(x[1], " ")[1]))
    names   = [k for (k,_) in sorted]
    trials  = [v for (_,v) in sorted]

    # Pack into matrix
    data = hcat(
        names,
        [BenchmarkTools.prettytime(median(t).time) for t in trials],
        [BenchmarkTools.prettymemory(median(t).memory) for t in trials],
        [median(t).allocs for t in trials]
    )

    # Make pretty table
    pretty_table(data;
        column_labels = ["Benchmark", "Median Time", "Memory", "Allocs"],
        alignment     = [:l, :r, :r, :r]
    )
end

const SUITE = BenchmarkGroup()

const data = randn(StableRNG(1234), 512, 512) .+ 10
const err = fill(1.0, size(data))

SUITE["circular_aperture"] = BenchmarkGroup()

rows = map(range(1, 100; length=10)) do r
    ap = CircularAperture(256.5, 256.5, r)
    SUITE["circular_aperture"]["$r"] = @benchmarkable photometry($ap, $data)
    SUITE["circular_aperture"]["$r + error"] = @benchmarkable photometry($ap, $data, $err)
end

if get(ENV, "CI", "false") == "false"
    results = run(SUITE, verbose=true)
    show_benchmarks(results["circular_aperture"])
end
```

On main

```text
┌────────────────────────────┬─────────────┬─────────┬────────┐
│ Benchmark                  │ Median Time │  Memory │ Allocs │
├────────────────────────────┼─────────────┼─────────┼────────┤
│ 1.0                        │  131.130 ns │ 0 bytes │      0 │
│ 1.0 + error                │  313.960 ns │ 0 bytes │      0 │
│ 1.6681005372000588         │  209.360 ns │ 0 bytes │      0 │
│ 1.6681005372000588 + error │  696.810 ns │ 0 bytes │      0 │
│ 2.7825594022071245         │  361.820 ns │ 0 bytes │      0 │
│ 2.7825594022071245 + error │    1.232 μs │ 0 bytes │      0 │
│ 4.641588833612779          │  703.750 ns │ 0 bytes │      0 │
│ 4.641588833612779 + error  │    2.385 μs │ 0 bytes │      0 │
│ 7.74263682681127           │    1.349 μs │ 0 bytes │      0 │
│ 7.74263682681127 + error   │    4.469 μs │ 0 bytes │      0 │
│ 12.915496650148839         │    2.813 μs │ 0 bytes │      0 │
│ 12.915496650148839 + error │    8.728 μs │ 0 bytes │      0 │
│ 21.54434690031884          │    6.754 μs │ 0 bytes │      0 │
│ 21.54434690031884 + error  │   19.970 μs │ 0 bytes │      0 │
│ 35.938136638046274         │   15.006 μs │ 0 bytes │      0 │
│ 35.938136638046274 + error │   44.191 μs │ 0 bytes │      0 │
│ 59.948425031894104         │   36.298 μs │ 0 bytes │      0 │
│ 59.948425031894104 + error │  104.733 μs │ 0 bytes │      0 │
│ 100.0                      │   91.307 μs │ 0 bytes │      0 │
│ 100.0 + error              │  257.268 μs │ 0 bytes │      0 │
└────────────────────────────┴─────────────┴─────────┴────────┘
```

On this branch:

```text
┌────────────────────────────┬─────────────┬─────────┬────────┐
│ Benchmark                  │ Median Time │  Memory │ Allocs │
├────────────────────────────┼─────────────┼─────────┼────────┤
│ 1.0                        │  168.410 ns │ 0 bytes │      0 │
│ 1.0 + error                │  346.250 ns │ 0 bytes │      0 │
│ 1.6681005372000588         │  211.520 ns │ 0 bytes │      0 │
│ 1.6681005372000588 + error │  682.940 ns │ 0 bytes │      0 │
│ 2.7825594022071245         │  366.230 ns │ 0 bytes │      0 │
│ 2.7825594022071245 + error │    1.201 μs │ 0 bytes │      0 │
│ 4.641588833612779          │  711.350 ns │ 0 bytes │      0 │
│ 4.641588833612779 + error  │    2.278 μs │ 0 bytes │      0 │
│ 7.74263682681127           │    1.368 μs │ 0 bytes │      0 │
│ 7.74263682681127 + error   │    4.272 μs │ 0 bytes │      0 │
│ 12.915496650148839         │    2.809 μs │ 0 bytes │      0 │
│ 12.915496650148839 + error │    8.302 μs │ 0 bytes │      0 │
│ 21.54434690031884          │    6.591 μs │ 0 bytes │      0 │
│ 21.54434690031884 + error  │   18.664 μs │ 0 bytes │      0 │
│ 35.938136638046274         │   14.605 μs │ 0 bytes │      0 │
│ 35.938136638046274 + error │   40.901 μs │ 0 bytes │      0 │
│ 59.948425031894104         │   34.760 μs │ 0 bytes │      0 │
│ 59.948425031894104 + error │   95.657 μs │ 0 bytes │      0 │
│ 100.0                      │   86.298 μs │ 0 bytes │      0 │
│ 100.0 + error              │  234.108 μs │ 0 bytes │      0 │
└────────────────────────────┴─────────────┴─────────┴────────┘
```

---------

Co-authored-by: Ian Weaver <weaveric@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants