Open-source whole-slide pathology agent for mophology triage ROI collection, morphology-first diagnosis, and interactive WSI exploration. Support general slides and Acute myeloid leukemia (AML) slides
The project combines deterministic slide reduction, embedding-based candidate retrieval, and a vision-language model that navigates high-value regions instead of trying to reason over an entire gigapixel slide at once.
- Interactive FastAPI workbench for browser-based runs and live monitoring.
- Headless evaluation scripts for single-slide, batch, and cache-precomputation workflows.
- Support for standard WSI formats plus MIRAX files and server-local slide browsing.
System overview and WSI workflow.
Web workbench run view and result flow.
Evaluated on a private dataset of 372 bone marrow WSIs. VLMs were run in official FP8 quantized variants where available.
Results are averaged across available feature extractors. roi5 % is the percentage of runs where the model successfully reached the 5-ROI target in our task.
| Model | Success % | roi5 % | Avg. tool calls |
|---|---|---|---|
| Gemma-4-31B-it | 100.00 | 71.30 | 21.24 |
| Qwen3.5-397B-A17B-FP8 | 99.66 | 99.26 | 25.07 |
| DeepSeek-V4 | 100.00 | 99.73 | 27.10 |
| GLM-4.6V-FP8 | 92.14 | 95.63 | 26.28 |
| GPT-OSS-120B | 99.63 | 23.99 | 37.36 |
Gemma-4-31B-it and Qwen3.5-397B-A17B-FP8 were the main diagnosis models explored here: Gemma-4-31B-it showed stronger AML-morphology specificity, while Qwen3.5-397B-A17B-FP8 was more general in blood-cell image understanding.
Prompt wording has some effect, but the results are not dominated by prompt changes alone.
| Model | Extractor | Acc % | TP/FN | TN/FP | NPM1 % | HistSim |
|---|---|---|---|---|---|---|
| Gemma-4-31B-it | DinoBloom | 70.16 | 234/85 | 27/26 | 74.43 | 0.889 |
| Gemma-4-31B-it | H-optimus-1 | 72.85 | 246/73 | 25/28 | 71.74 | 0.897 |
| Gemma-4-31B-it | UNI2 | 78.23 | 269/50 | 22/31 | 73.02 | 0.919 |
| Gemma-4-31B-it | Virchow2 | 77.96 | 270/49 | 20/33 | 72.00 | 0.907 |
| Qwen3.5-397B-A17B-FP8 | DinoBloom | 60.48 | 177/142 | 48/5 | 65.67 | 0.885 |
| Qwen3.5-397B-A17B-FP8 | H-optimus-1 | 61.02 | 182/137 | 45/8 | 65.96 | 0.896 |
| Qwen3.5-397B-A17B-FP8 | UNI2 | 69.09 | 215/104 | 42/11 | 64.15 | 0.915 |
| Qwen3.5-397B-A17B-FP8 | Virchow2 | 66.13 | 207/112 | 39/14 | 67.88 | 0.906 |
HistSim measures histogram similarity between clinician-selected ROIs and agent-selected ROIs.
uv sync
source .venv/bin/activateIf your model backend needs API keys or other runtime settings, create a .env file with the variables you use locally.
The OpenAI-compatible client settings live in configs/config.yaml under the agent section:
agent:
OPENAI_API_BASE: "http://pluto/v1"
OPENAI_API_KEY: "local"Environment variables still override the YAML values, so you can also set them in your shell or .env:
export OPENAI_API_BASE="http://your-server/v1"
export OPENAI_API_KEY="your-key"Update model exposure in main.py and wsi_core.py by setting MODEL_NAME and ALLOWED_MODEL_NAMES.
To expose server-local slide roots in the web Explorer, set SERVER_SLIDE_ROOTS before starting the app:
export SERVER_SLIDE_ROOTS="/some/other/root"Supported sources:
- Standard slide files:
.svs,.tif,.tiff,.ndpi - MIRAX files:
.mrxs,.mrsx - MIRAX companion folders: select the folder whose sibling
.mrxsor.mrsxhas the same stem
python main.pyThe app starts on port 3008 by default and falls forward to the next free port if needed.
WSI
-> tile extraction and quality filtering
-> embedding-based candidate ranking
-> agentic ROI navigation and selection
-> morphology-only diagnosis from accepted ROIs
aml_auto is the default production path:
WSI
-> aml_roi
-> roi_collection.json + ROI images
-> aml_diagnosis
-> final JSON diagnosis + report
The web UI is organized around three panels:
- Input and Run: slide source, agent, model, extractor, tile size, batch size, and tile filter.
- Slide Viewer: slide overview plus ROI snapshots collected during navigation.
- Run Status: live steps, current state, errors, and report link.
You can start a run from:
- uploaded slide files such as
.svs,.tif,.tiff,.ndpi - MIRAX folders or zip bundles
- the built-in server Explorer for server-local or HPC slide roots
- Agent:
aml_auto,aml_roi,aml_diagnosis,tile, orwsi - Model: which VLM is exposed in the workbench
- Feature extractor: embedding backbone used for ROI candidate preparation
- Tile size: patch size for the extractor path
- Batch size: embedding throughput during tile feature extraction
- Tile filter: candidate prefilter before expensive embedding
Tile filter modes:
- Quality score: ranks tiles by focus, stain, texture, and artifact heuristics.
- Coarse to fine: thumbnail-level region prefilter before embedding.
- Hybrid: combines region screening with raw-tile quality prefiltering.
- None: keeps only the baseline foreground and texture gating.
- Choose a slide source or browse the server Explorer.
- Pick the agent, model, extractor, tile size, batch size, and tile filter.
- Start the run.
- Follow the live trace while reviewing the overview and ROI panes.
agent_type |
Purpose | Slide input | Output |
|---|---|---|---|
aml_auto |
Full two-stage AML workflow | required | ROI bundle + diagnosis report |
aml_roi |
Stage 1 ROI collection only | required | roi_collection.json + ROI images |
aml_diagnosis |
Stage 2 diagnosis from existing ROIs | not required | strict AML JSON diagnosis |
tile |
Save good and bad tiles for curation or training | required | labeled tiles under Selected_Tiles |
wsi |
General-purpose pathology exploration agent | required | task-shaped report and saved ROIs |
aml_roi is the evidence-acquisition stage. It navigates the slide, opens ranked candidates, and marks acceptable high-power ROIs toward the configured target of 5 for downstream diagnosis.
Outputs include:
roi_collection.jsonimages/roi_N.jpgimages/slide_overview.jpgimages/roi_candidates.jpg- navigation summary
report.md
aml_diagnosis receives ROI images only. It does not navigate the slide and does not see slide-level metadata. The output is a strict JSON object with morphology summary, accepted ROI reasoning, blast range, triage zone, final decision, limitations, confidence, and conditional NPM1 prediction.
wsi is the non-AML mode for broader pathology tasks such as tumour description, MSI-oriented inspection, inflammation review, or other prompt-defined workflows.
When prompted for MSI screening, the agent samples at least three distinct tumour regions, marks representative ROIs, and returns a qualitative MSI-H or MSS impression with the caveat that definitive status still requires IHC or molecular testing.
See the dedicated CLI guide: evaluate/README.md.
Main entrypoints:
evaluate/run_single_slide.py: single-case headless runevaluate/run_batch_aml.sh: CSV-driven AML batchevaluate/run_batch_aml_suite.sh: model/extractor suiteevaluate/preextract_hybrid_cache.py: feature-cache prewarmevaluate/preextract_hybrid_cache_suite.sh: multi-extractor prewarm
AML mode uses retrieval against curated reference tiles to improve ROI ranking. Prebuilding the reference index can reduce startup cost significantly.
Quick start:
python -m wsi_core_pkg.embeddings.prebuild_reference_embeddings \
--tiles-root ./Selected_Tiles \
--output-dir ./outputs/cache/reference_hnsw \
--extractor reddinoDetailed notes live in evaluate/README.md under Reference Embeddings.
- evaluate/README.md: CLI workflows and reference-embedding setup
- docs/AML_AGENT_PIPELINE_METHODOLOGY.md: methodology and algorithmic description
- docs/AML_AGENT_PIPELINE_IMPLEMENTATION.md: implementation details
- docs/AML_AGENT_TOOLS_APPENDIX.md: tool and navigation appendix
- docs/AML_PROMPTS_APPENDIX.md: prompt appendix
- docs/EVALUATION_METRICS.md: metric definitions