Optimize com_psf#59
Open
cgarling wants to merge 7 commits into
Open
Conversation
Needed because Photometry.jl uses Transducers.jl
I did not understand that `collect(img_ap)` returns a different shape than `collect(T, img_ap)`.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #59 +/- ##
==========================================
+ Coverage 99.45% 99.49% +0.03%
==========================================
Files 5 5
Lines 185 197 +12
==========================================
+ Hits 184 196 +12
Misses 1 1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cgarling
added a commit
to JuliaAstro/Photometry.jl
that referenced
this pull request
May 23, 2026
In JuliaAstro/Astroalign.jl#59 we encounter an issue where the Transducers.Eduction that is passed into the function `f` keyword argument of `photometry()` is not appropriate for further analysis as the lazy iterator does not support `axes` and other similar methods without first being `collect`ed. This PR aims to remove the need to call `collect` downstream by making sure that matrix-like objects are passed into the `f` keyword argument of `photometry`. Here I replace the Transducers.Map call with a new `WeightedApertureCutout` struct that represents a lazy cutout. By defining a custom struct we can ensure all methods (`getindex`, `axes`, etc) we need can be supported. Here's a benchmark I wrote testing the performance with CircularAperture at a variety of radii. Currently this is a *touch* slower for small apertures, not sure why but open to suggestions for improvements ```julia using BenchmarkTools using StableRNGs using Photometry using PrettyTables: pretty_table function show_benchmarks(results) # Collect results sorted = sort(collect(results), by = x -> parse(Float64, split(x[1], " ")[1])) names = [k for (k,_) in sorted] trials = [v for (_,v) in sorted] # Pack into matrix data = hcat( names, [BenchmarkTools.prettytime(median(t).time) for t in trials], [BenchmarkTools.prettymemory(median(t).memory) for t in trials], [median(t).allocs for t in trials] ) # Make pretty table pretty_table(data; column_labels = ["Benchmark", "Median Time", "Memory", "Allocs"], alignment = [:l, :r, :r, :r] ) end const SUITE = BenchmarkGroup() const data = randn(StableRNG(1234), 512, 512) .+ 10 const err = fill(1.0, size(data)) SUITE["circular_aperture"] = BenchmarkGroup() rows = map(range(1, 100; length=10)) do r ap = CircularAperture(256.5, 256.5, r) SUITE["circular_aperture"]["$r"] = @benchmarkable photometry($ap, $data) SUITE["circular_aperture"]["$r + error"] = @benchmarkable photometry($ap, $data, $err) end if get(ENV, "CI", "false") == "false" results = run(SUITE, verbose=true) show_benchmarks(results["circular_aperture"]) end ``` On main ```text ┌────────────────────────────┬─────────────┬─────────┬────────┐ │ Benchmark │ Median Time │ Memory │ Allocs │ ├────────────────────────────┼─────────────┼─────────┼────────┤ │ 1.0 │ 131.130 ns │ 0 bytes │ 0 │ │ 1.0 + error │ 313.960 ns │ 0 bytes │ 0 │ │ 1.6681005372000588 │ 209.360 ns │ 0 bytes │ 0 │ │ 1.6681005372000588 + error │ 696.810 ns │ 0 bytes │ 0 │ │ 2.7825594022071245 │ 361.820 ns │ 0 bytes │ 0 │ │ 2.7825594022071245 + error │ 1.232 μs │ 0 bytes │ 0 │ │ 4.641588833612779 │ 703.750 ns │ 0 bytes │ 0 │ │ 4.641588833612779 + error │ 2.385 μs │ 0 bytes │ 0 │ │ 7.74263682681127 │ 1.349 μs │ 0 bytes │ 0 │ │ 7.74263682681127 + error │ 4.469 μs │ 0 bytes │ 0 │ │ 12.915496650148839 │ 2.813 μs │ 0 bytes │ 0 │ │ 12.915496650148839 + error │ 8.728 μs │ 0 bytes │ 0 │ │ 21.54434690031884 │ 6.754 μs │ 0 bytes │ 0 │ │ 21.54434690031884 + error │ 19.970 μs │ 0 bytes │ 0 │ │ 35.938136638046274 │ 15.006 μs │ 0 bytes │ 0 │ │ 35.938136638046274 + error │ 44.191 μs │ 0 bytes │ 0 │ │ 59.948425031894104 │ 36.298 μs │ 0 bytes │ 0 │ │ 59.948425031894104 + error │ 104.733 μs │ 0 bytes │ 0 │ │ 100.0 │ 91.307 μs │ 0 bytes │ 0 │ │ 100.0 + error │ 257.268 μs │ 0 bytes │ 0 │ └────────────────────────────┴─────────────┴─────────┴────────┘ ``` On this branch: ```text ┌────────────────────────────┬─────────────┬─────────┬────────┐ │ Benchmark │ Median Time │ Memory │ Allocs │ ├────────────────────────────┼─────────────┼─────────┼────────┤ │ 1.0 │ 168.410 ns │ 0 bytes │ 0 │ │ 1.0 + error │ 346.250 ns │ 0 bytes │ 0 │ │ 1.6681005372000588 │ 211.520 ns │ 0 bytes │ 0 │ │ 1.6681005372000588 + error │ 682.940 ns │ 0 bytes │ 0 │ │ 2.7825594022071245 │ 366.230 ns │ 0 bytes │ 0 │ │ 2.7825594022071245 + error │ 1.201 μs │ 0 bytes │ 0 │ │ 4.641588833612779 │ 711.350 ns │ 0 bytes │ 0 │ │ 4.641588833612779 + error │ 2.278 μs │ 0 bytes │ 0 │ │ 7.74263682681127 │ 1.368 μs │ 0 bytes │ 0 │ │ 7.74263682681127 + error │ 4.272 μs │ 0 bytes │ 0 │ │ 12.915496650148839 │ 2.809 μs │ 0 bytes │ 0 │ │ 12.915496650148839 + error │ 8.302 μs │ 0 bytes │ 0 │ │ 21.54434690031884 │ 6.591 μs │ 0 bytes │ 0 │ │ 21.54434690031884 + error │ 18.664 μs │ 0 bytes │ 0 │ │ 35.938136638046274 │ 14.605 μs │ 0 bytes │ 0 │ │ 35.938136638046274 + error │ 40.901 μs │ 0 bytes │ 0 │ │ 59.948425031894104 │ 34.760 μs │ 0 bytes │ 0 │ │ 59.948425031894104 + error │ 95.657 μs │ 0 bytes │ 0 │ │ 100.0 │ 86.298 μs │ 0 bytes │ 0 │ │ 100.0 + error │ 234.108 μs │ 0 bytes │ 0 │ └────────────────────────────┴─────────────┴─────────┴────────┘ ``` --------- Co-authored-by: Ian Weaver <weaveric@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If you copy-paste the revised function definitions from this PR into your REPL you can run the following comparison which will show the new implementation is ~3x faster and allocation free EDIT: If you have to collect the input (as we do because of how Photometry.jl passes us the cutout) then it will be 2 allocations and adds ~100 ns to the base case speed. I chose to do
com_psf(T::Type{<:AbstractFloat}, img_ap, rel_thresh)because Float64 is not really any slower on my testing so this way it makes it easier for us to switch to Float64 in the future if we every wanted to.My result:
EDIT: Because Photometry.jl will pass us an object from Transducers.jl we have to call
collect, giving us 2 allocations and runtime +100 ns over not collecting (if we had a pure matrix input).