Optimize com_psf by cgarling · Pull Request #59 · JuliaAstro/Astroalign.jl

cgarling · 2026-05-20T20:02:59Z

If you copy-paste the revised function definitions from this PR into your REPL you can run the following comparison which will show the new implementation is ~3x faster and allocation free EDIT: If you have to collect the input (as we do because of how Photometry.jl passes us the cutout) then it will be 2 allocations and adds ~100 ns to the base case speed. I chose to do com_psf(T::Type{<:AbstractFloat}, img_ap, rel_thresh) because Float64 is not really any slower on my testing so this way it makes it easier for us to switch to Float64 in the future if we every wanted to.

using PSFModels: gaussian
using BenchmarkTools
import Astroalign

const x = 1:20
const y = 1:20
const T = Float32
model(x, y, amp) = gaussian(T, 4, 4; x, y, fwhm=3) * amp
data = model.(x, y', 10) .+ T(0.1) * randn(T, length(x), length(y))
result_orig = Astroalign.com_psf(data; rel_thresh=0.1f0)
result_revised = com_psf(data; rel_thresh=0.1f0)
println("Original COM: ", (result_orig.psf_params.x, result_orig.psf_params.y))
println("Revised COM: ", (result_revised.psf_params.x, result_revised.psf_params.y))
println("Original FWHM: ", result_orig.psf_params.fwhm)
println("Revised FWHM: ", result_revised.psf_params.fwhm)
println("Original benchmark: ")
display(@benchmark Astroalign.com_psf($data; rel_thresh=$0.1f0))
println("Revised benchmark: ")
display(@benchmark com_psf($data; rel_thresh=$0.1f0))

My result:

EDIT: Because Photometry.jl will pass us an object from Transducers.jl we have to call collect, giving us 2 allocations and runtime +100 ns over not collecting (if we had a pure matrix input).

julia> println("Original COM: ", (result_orig.psf_params.x, result_orig.psf_params.y))
Original COM: (3.9974918f0, 3.9858212f0)

julia> println("Revised COM: ", (result_revised.psf_params.x, result_revised.psf_params.y))
Revised COM: (3.9974945f0, 3.9858336f0)

julia> println("Original FWHM: ", result_orig.psf_params.fwhm)
Original FWHM: (2.31581660320762, 2.3766457470426725)

julia> println("Revised FWHM: ", result_revised.psf_params.fwhm)
Revised FWHM: (2.3164616f0, 2.3772223f0)

julia> display(@benchmark Astroalign.com_psf($data; rel_thresh=$0.1f0))
BenchmarkTools.Trial: 10000 samples with 10 evaluations per sample.
 Range (min … max):  1.408 μs …  1.050 ms  ┊ GC (min … max):  0.00% … 99.19%
 Time  (median):     1.544 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   2.372 μs ± 15.628 μs  ┊ GC (mean ± σ):  21.39% ±  4.62%

  ▆█▇▅▃▁▁▁                ▃▄▃▂                               ▂
  █████████▇▇▆▇▇▇▇▆▇▆▆▆▇▆██████▆▆▆▅▄▅▄▅▅▃▅▆███▇▅▄▄▄▁▁▄▁▆▅▇██ █
  1.41 μs      Histogram: log(frequency) by time     5.56 μs <

 Memory estimate: 10.41 KiB, allocs estimate: 20.

julia> println("Revised benchmark: ")
Revised benchmark:

julia> @benchmark com_psf($data; rel_thresh=$0.1f0)
BenchmarkTools.Trial: 10000 samples with 183 evaluations per sample.
 Range (min … max):  571.716 ns …   9.490 μs  ┊ GC (min … max): 0.00% … 88.05%
 Time  (median):     600.563 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   678.928 ns ± 532.693 ns  ┊ GC (mean ± σ):  8.48% ±  9.61%

  █▅▂▁▁                                                         ▁
  ██████▇▇▆▆▄▅▁▄▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▄▃▄▄▄▄▄▄▄▃▄▅ █
  572 ns        Histogram: log(frequency) by time        4.5 μs <

 Memory estimate: 1.64 KiB, allocs estimate: 2.

Needed because Photometry.jl uses Transducers.jl

I did not understand that `collect(img_ap)` returns a different shape than `collect(T, img_ap)`.

codecov · 2026-05-20T21:32:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.49%. Comparing base (02a35cd) to head (6fc8e45).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #59      +/-   ##
==========================================
+ Coverage   99.45%   99.49%   +0.03%     
==========================================
  Files           5        5              
  Lines         185      197      +12     
==========================================
+ Hits          184      196      +12     
  Misses          1        1

Flag	Coverage Δ
unittests	`99.49% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

In JuliaAstro/Astroalign.jl#59 we encounter an issue where the Transducers.Eduction that is passed into the function `f` keyword argument of `photometry()` is not appropriate for further analysis as the lazy iterator does not support `axes` and other similar methods without first being `collect`ed. This PR aims to remove the need to call `collect` downstream by making sure that matrix-like objects are passed into the `f` keyword argument of `photometry`. Here I replace the Transducers.Map call with a new `WeightedApertureCutout` struct that represents a lazy cutout. By defining a custom struct we can ensure all methods (`getindex`, `axes`, etc) we need can be supported. Here's a benchmark I wrote testing the performance with CircularAperture at a variety of radii. Currently this is a *touch* slower for small apertures, not sure why but open to suggestions for improvements ```julia using BenchmarkTools using StableRNGs using Photometry using PrettyTables: pretty_table function show_benchmarks(results) # Collect results sorted = sort(collect(results), by = x -> parse(Float64, split(x[1], " ")[1])) names = [k for (k,_) in sorted] trials = [v for (_,v) in sorted] # Pack into matrix data = hcat( names, [BenchmarkTools.prettytime(median(t).time) for t in trials], [BenchmarkTools.prettymemory(median(t).memory) for t in trials], [median(t).allocs for t in trials] ) # Make pretty table pretty_table(data; column_labels = ["Benchmark", "Median Time", "Memory", "Allocs"], alignment = [:l, :r, :r, :r] ) end const SUITE = BenchmarkGroup() const data = randn(StableRNG(1234), 512, 512) .+ 10 const err = fill(1.0, size(data)) SUITE["circular_aperture"] = BenchmarkGroup() rows = map(range(1, 100; length=10)) do r ap = CircularAperture(256.5, 256.5, r) SUITE["circular_aperture"]["$r"] = @benchmarkable photometry($ap, $data) SUITE["circular_aperture"]["$r + error"] = @benchmarkable photometry($ap, $data, $err) end if get(ENV, "CI", "false") == "false" results = run(SUITE, verbose=true) show_benchmarks(results["circular_aperture"]) end ``` On main ```text ┌────────────────────────────┬─────────────┬─────────┬────────┐ │ Benchmark │ Median Time │ Memory │ Allocs │ ├────────────────────────────┼─────────────┼─────────┼────────┤ │ 1.0 │ 131.130 ns │ 0 bytes │ 0 │ │ 1.0 + error │ 313.960 ns │ 0 bytes │ 0 │ │ 1.6681005372000588 │ 209.360 ns │ 0 bytes │ 0 │ │ 1.6681005372000588 + error │ 696.810 ns │ 0 bytes │ 0 │ │ 2.7825594022071245 │ 361.820 ns │ 0 bytes │ 0 │ │ 2.7825594022071245 + error │ 1.232 μs │ 0 bytes │ 0 │ │ 4.641588833612779 │ 703.750 ns │ 0 bytes │ 0 │ │ 4.641588833612779 + error │ 2.385 μs │ 0 bytes │ 0 │ │ 7.74263682681127 │ 1.349 μs │ 0 bytes │ 0 │ │ 7.74263682681127 + error │ 4.469 μs │ 0 bytes │ 0 │ │ 12.915496650148839 │ 2.813 μs │ 0 bytes │ 0 │ │ 12.915496650148839 + error │ 8.728 μs │ 0 bytes │ 0 │ │ 21.54434690031884 │ 6.754 μs │ 0 bytes │ 0 │ │ 21.54434690031884 + error │ 19.970 μs │ 0 bytes │ 0 │ │ 35.938136638046274 │ 15.006 μs │ 0 bytes │ 0 │ │ 35.938136638046274 + error │ 44.191 μs │ 0 bytes │ 0 │ │ 59.948425031894104 │ 36.298 μs │ 0 bytes │ 0 │ │ 59.948425031894104 + error │ 104.733 μs │ 0 bytes │ 0 │ │ 100.0 │ 91.307 μs │ 0 bytes │ 0 │ │ 100.0 + error │ 257.268 μs │ 0 bytes │ 0 │ └────────────────────────────┴─────────────┴─────────┴────────┘ ``` On this branch: ```text ┌────────────────────────────┬─────────────┬─────────┬────────┐ │ Benchmark │ Median Time │ Memory │ Allocs │ ├────────────────────────────┼─────────────┼─────────┼────────┤ │ 1.0 │ 168.410 ns │ 0 bytes │ 0 │ │ 1.0 + error │ 346.250 ns │ 0 bytes │ 0 │ │ 1.6681005372000588 │ 211.520 ns │ 0 bytes │ 0 │ │ 1.6681005372000588 + error │ 682.940 ns │ 0 bytes │ 0 │ │ 2.7825594022071245 │ 366.230 ns │ 0 bytes │ 0 │ │ 2.7825594022071245 + error │ 1.201 μs │ 0 bytes │ 0 │ │ 4.641588833612779 │ 711.350 ns │ 0 bytes │ 0 │ │ 4.641588833612779 + error │ 2.278 μs │ 0 bytes │ 0 │ │ 7.74263682681127 │ 1.368 μs │ 0 bytes │ 0 │ │ 7.74263682681127 + error │ 4.272 μs │ 0 bytes │ 0 │ │ 12.915496650148839 │ 2.809 μs │ 0 bytes │ 0 │ │ 12.915496650148839 + error │ 8.302 μs │ 0 bytes │ 0 │ │ 21.54434690031884 │ 6.591 μs │ 0 bytes │ 0 │ │ 21.54434690031884 + error │ 18.664 μs │ 0 bytes │ 0 │ │ 35.938136638046274 │ 14.605 μs │ 0 bytes │ 0 │ │ 35.938136638046274 + error │ 40.901 μs │ 0 bytes │ 0 │ │ 59.948425031894104 │ 34.760 μs │ 0 bytes │ 0 │ │ 59.948425031894104 + error │ 95.657 μs │ 0 bytes │ 0 │ │ 100.0 │ 86.298 μs │ 0 bytes │ 0 │ │ 100.0 + error │ 234.108 μs │ 0 bytes │ 0 │ └────────────────────────────┴─────────────┴─────────┴────────┘ ``` --------- Co-authored-by: Ian Weaver <weaveric@gmail.com>

cgarling added 3 commits May 20, 2026 15:54

optimize com_psf

75f4d1f

fix for inputs that haven't been collected yet

326f810

Needed because Photometry.jl uses Transducers.jl

just force collect, idk how transducers works

064899a

cgarling marked this pull request as draft May 20, 2026 20:28

fix com_psf with collect(T, img_ap)

f0da383

I did not understand that `collect(img_ap)` returns a different shape than `collect(T, img_ap)`.

cgarling marked this pull request as ready for review May 20, 2026 21:30

cgarling requested a review from icweaver May 20, 2026 21:36

cgarling mentioned this pull request May 21, 2026

Replace Transducers with lazy matrix cutout JuliaAstro/Photometry.jl#113

Merged

cgarling and others added 3 commits May 20, 2026 23:01

Merge branch 'main' into optimize_com

f591550

Merge branch 'main' into optimize_com

6fc8e45

[skip ci] factor out reused expression for clarity

e43bfa2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize com_psf#59

Optimize com_psf#59
cgarling wants to merge 7 commits into
mainfrom
optimize_com

cgarling commented May 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cgarling commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cgarling commented May 20, 2026 •

edited

Loading

codecov Bot commented May 20, 2026 •

edited

Loading