Support `cutoff` number of features per slide by andsild · Pull Request #32 · DigitalSlideArchive/superpixel-classification

andsild · 2025-06-05T19:37:05Z

Adds an option for cutoff features/predictions to be used, so that you only do predictions on n unlabeled features per round/epoch. All labeled samples will always be included. This is to speed up training in cases where you have many slides with many annotations (in our AML case, several millions). The benefit is speed, the downside may be that feature files no longer contain all the data if a user wants to download them.

The intuition is that I doubt anyone will annotate more than a few thousand samples in any round.

The commits also sneak in a change: labeled samples will occur last in the DSA AL filmstrip now by assigning an low confidence score. Feature files will also have a "used_indices" list that makes it possible to track which features come from which superpixel

I was paranoid about breaking anything, so I have also included tests. The tests verify each step: superpixel generation, feature extraction, training and prediction. There is also one test for the whole pipeline, which uses an MNIST slide and verifies that the predictions achieve > 80 accuracy. The tests can also be used for benchmarking.

Also see #31, which allows anyone to disable CUDA for easier testing on local machines.

I have tested in DSA with my own AML slides and also using the default superpixels from the UI.

This is to speed up AL loop Not a perfect solution, the UI will now recommend that users predict "default" for a lot of the labels. But it is a first step to make sure we can handle large slide with millions of annotations

This may not have been a bug before, but now when indices may not be in order (since we are using `cutoff`), it becomes relevant

Easier for tests

andsild · 2025-06-05T19:42:34Z

Some tests may depend on #33

andsild added 5 commits June 5, 2025 14:25

Support cutoff predictions for AL

ff7f8c1

This is to speed up AL loop Not a perfect solution, the UI will now recommend that users predict "default" for a lot of the labels. But it is a first step to make sure we can handle large slide with millions of annotations

Bugfix: use global index, not batch index, for bounding boxes

207067a

This may not have been a bug before, but now when indices may not be in order (since we are using `cutoff`), it becomes relevant

Add simple tests for features, training, pred

68293c3

Make girder client a parameter

3cde3a6

Easier for tests

Add simple script to inspect feature files

9111264

andsild mentioned this pull request Jun 5, 2025

Allow a background pixel of [0,0,1,1], which will always be ignored #33

Draft

andsild mentioned this pull request Jun 5, 2025

Allow cutoff predictions, support empty background pixels and tests #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `cutoff` number of features per slide#32

Support `cutoff` number of features per slide#32
andsild wants to merge 5 commits intoDigitalSlideArchive:mainfrom
andsild:cutoff

andsild commented Jun 5, 2025

Uh oh!

andsild commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andsild commented Jun 5, 2025

Uh oh!

andsild commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant