Support cutoff number of features per slide#32
Open
andsild wants to merge 5 commits intoDigitalSlideArchive:mainfrom
Open
Support cutoff number of features per slide#32andsild wants to merge 5 commits intoDigitalSlideArchive:mainfrom
cutoff number of features per slide#32andsild wants to merge 5 commits intoDigitalSlideArchive:mainfrom
Conversation
This is to speed up AL loop Not a perfect solution, the UI will now recommend that users predict "default" for a lot of the labels. But it is a first step to make sure we can handle large slide with millions of annotations
This may not have been a bug before, but now when indices may not be in order (since we are using `cutoff`), it becomes relevant
Easier for tests
Author
|
Some tests may depend on #33 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an option for
cutofffeatures/predictions to be used, so that you only do predictions onnunlabeled features per round/epoch. All labeled samples will always be included. This is to speed up training in cases where you have many slides with many annotations (in our AML case, several millions). The benefit is speed, the downside may be that feature files no longer contain all the data if a user wants to download them.The intuition is that I doubt anyone will annotate more than a few thousand samples in any round.
The commits also sneak in a change: labeled samples will occur last in the DSA AL filmstrip now by assigning an low confidence score. Feature files will also have a "used_indices" list that makes it possible to track which features come from which superpixel
I was paranoid about breaking anything, so I have also included tests. The tests verify each step: superpixel generation, feature extraction, training and prediction. There is also one test for the whole pipeline, which uses an MNIST slide and verifies that the predictions achieve > 80 accuracy. The tests can also be used for benchmarking.
Also see #31, which allows anyone to disable CUDA for easier testing on local machines.
I have tested in DSA with my own AML slides and also using the default superpixels from the UI.