Skip to content

Add override mechanism for algorithm containers, transfer explicit .sif images with HTCondor#464

Open
jhiemstrawisc wants to merge 12 commits intoReed-CompBio:mainfrom
jhiemstrawisc:explicit-sif-transfer
Open

Add override mechanism for algorithm containers, transfer explicit .sif images with HTCondor#464
jhiemstrawisc wants to merge 12 commits intoReed-CompBio:mainfrom
jhiemstrawisc:explicit-sif-transfer

Conversation

@jhiemstrawisc
Copy link
Copy Markdown
Collaborator

This PR aims to accomplish a few things:

  1. It adds a new valid keys to config.yaml that let a user say "For reconstruction algorithm foo, I want you to use container image bar". In the general case, I'm not sure how much this is needed, but @agitter and I have floated the concept in the past, and it helps me solve 2. This implements a 4-tier hierarchy over how much of the image name is overridden (with special logic for .sif extensions), e.g.:
containers:
  registry:
    base_url: docker.io
    owner: reedcompbio
    
    images:
      omicsintegrator1: "oi1:latest"
      omicsintegrator2: "jhiemstra/oi2:latest
      mincostflow: "hub.opensciencegrid.org/jhiemstra/mincostflow:latest
      allpairs: "images/allpairs.sif"

would result in container URIs of

  • omicsintegrator1 --> docker.io/reedcompbio/oi1:latest
  • omicsintegrator2 --> docker.io/jhiemstra/oi2:latest1
  • mincostflow --> hub.opensciencegrid.org/jhiemstra/mincostflow:latest
  • allpairs --> images/allpairs.sif (local file)

The caveat here is that I don't believe all container registries follow this kind of <base_url>/<owner>/<container> hierarchy, but it's at least true for docker, ghcr and hub.opensciencegrid.org. If a user finds themselves somewhere outside those, they can always declare the entire URI explicitly.

  1. When a container override contains the .sif extension, the file is added to the reconstruct rule's htcondor_transfer_input_files resource key. This triggers HTCondor to transfer the sif image as part of the job's input sandbox.
  2. When a .sif override is present and apptainer/singularity is the configured container framework, SPRAS uses the local file instead of pulling/building from a remote registry. This is combined with 2) are the key fixes for Provide guidance on working around docker rate limiting for large CHTC runs #462
  3. Some of the container resolution logic started to get sufficiently nested with if/else logic that I wanted to split it out into a helper function I could more easily test. This is where most of the new test code comes from.
  4. Finally, I fixed a small-but-inconsequential bug I noticed in the way we treated singularity and apptainer container frameworks differently in some cases. I added a helper function that makes it easier to treat these as synonyms.

Note that I haven't yet documented this guidance, as #462 requests -- I'd rather get this over the finish line first, then add the documentation to #459 (so I don't create conflicts for myself)

Copy link
Copy Markdown
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes make sense to me overall.

Comment thread config/config.yaml Outdated
Comment thread config/config.yaml
Comment thread spras/config/container_schema.py
Comment thread spras/containers.py
Comment thread test/test_container_image_resolution.py Outdated
@tristan-f-r tristan-f-r added the enhancement New feature or request label Mar 18, 2026
@github-actions github-actions Bot added the merge-conflict This PR has merge conflicts. label Apr 7, 2026
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Apr 7, 2026

Documentation build overview

📚 spras | 🛠️ Build #32474638 | 📁 Comparing 2599ec8 against latest (87b314c)

  🔍 Preview build  

5 files changed · ± 5 modified

± Modified

Copy link
Copy Markdown
Collaborator

@tristan-f-r tristan-f-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine from a big picture: I am of course wary of:

  • The dynamic algorithm import for the new ALGORITHM_REGISTRY, but the dependency diamond there seems unavoidable.
  • The strange conflation for having both image_override and images in the same structure, but I have no clean ideas for addressing that at the moment [and I don't want to block this PR on waiting for a better refactor to come along].

I'll add some smaller comments 👍

@jhiemstrawisc jhiemstrawisc force-pushed the explicit-sif-transfer branch from 98aa98d to 390e197 Compare April 8, 2026 14:04
@github-actions github-actions Bot removed the merge-conflict This PR has merge conflicts. label Apr 8, 2026
@github-actions github-actions Bot added the merge-conflict This PR has merge conflicts. label Apr 17, 2026
Copy link
Copy Markdown
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates addressed my major comments. I only have one minor new one.

However, my local tests fail. Some are unrelated to these changes, some are related:

FAILED test/RWR/test_RWR.py::TestRWR::test_rwr - AssertionError: Output file does not match expected output file
FAILED test/ResponseNet/test_rn.py::TestResponseNet::test_responsenet_required - AssertionError: assert False
FAILED test/ResponseNet/test_rn.py::TestResponseNet::test_responsenet_all_optional - AssertionError: assert False
FAILED test/ST_RWR/test_STRWR.py::TestSTRWR::test_strwr - AssertionError: Output file does not match expected output file
FAILED test/analysis/test_cytoscape.py::TestCytoscape::test_cytoscape - spras.containers.ContainerError: (Command formatted as list: `['python', '/py4cytoscape/cytoscape_util.py', '--output', '/spras/T32FPZG/cytoscape.cys', '--pathway', '/spras/5M63HVH/pathway.txt|test\\analysis\\...
FAILED test/analysis/test_summary.py::TestSummary::test_example_networks[example] - AssertionError: assert False
FAILED test/analysis/test_summary.py::TestSummary::test_example_networks[egfr] - AssertionError: assert False
FAILED test/evaluate/test_evaluate.py::TestEvaluate::test_node_precision_recall_pca_chosen_pathway - assert False
FAILED test/generate-inputs/test_generate_inputs.py::TestGenerateInputs::test_prepare_inputs_networks - AssertionError: assert False
FAILED test/parse-outputs/test_parse_outputs.py::TestParseOutputs::test_parse_outputs - AssertionError: assert False
FAILED test/parse-outputs/test_parse_outputs.py::TestParseOutputs::test_duplicate_edges - AssertionError: assert False
FAILED test/test_container_image_resolution.py::TestPrepareSingularityImage::test_local_sif_with_unpack_builds_sandbox - AttributeError: module 'spython' has no attribute 'main'
FAILED test/test_container_image_resolution.py::TestPrepareSingularityImage::test_registry_with_unpack_pulls_and_builds - AttributeError: module 'spython' has no attribute 'main'
FAILED test/test_container_image_resolution.py::TestPrepareSingularityImage::test_local_sif_no_unpack_returns_sif_path - ModuleNotFoundError: No module named 'pwd'
FAILED test/test_container_image_resolution.py::TestPrepareSingularityImage::test_no_override_no_unpack_returns_docker_uri - ModuleNotFoundError: No module named 'pwd'
FAILED test/test_container_image_resolution.py::TestPrepareSingularityImage::test_unpack_skips_build_if_sandbox_exists - AttributeError: module 'spython' has no attribute 'main'

@tristan-f-r do you want to leave more detailed comments still?

Comment thread spras/config/config.py Outdated
@agitter
Copy link
Copy Markdown
Collaborator

agitter commented Apr 17, 2026

I suspect the root cause of my spython and pwd errors is that pwd is Unix only: https://docs.python.org/3/library/pwd.html. If that's true, we could skip those tests or mock pwd.

@tristan-f-r
Copy link
Copy Markdown
Collaborator

tristan-f-r commented Apr 17, 2026

Sounds good. I didn't find anything in my code passthrough, unfortunately.

…th HTCondor

This accomplishes two main things:
1. Users can explicitly state what container they want a given PRM
to run in via the configuration file, using the PRM name (as defined
in the config file) as the key.
2. When users specify an override `.sif` image, that image is added
to an HTCondor transfer list such that Condor moves the file to the
EP for execution (to avoid pulling during the job). Explicitly moving
required input files is a "best practice" in HTCondor, because failure
to resolve inputs at runtime squanders capacity.

When no override is provided or the HTCondor Snakemake executor isn't
available, the new Snakefile resource rule should be a no-op.

In addition to adding the features, I tried to split up some other
functions in and around container resolution to make them more testable.
This adds a more robust way to check whether the container framekwork
is apptainer/singularity, which for the purposes of our codebase should
be treated as synonyms.

I decided to do this after noticing an issue in test logs where the container
framework was set to apptainer, and `unpack_singularity` was true -- the
unpacking behavior happened correctly despite a logged warning claiming it
wouldn't happen because the warning only checked for singularity.

I believe this diff makes that type of mistake a little harder.
As I started writing the PR message, I realized things weren't quite the way I wanted them to be w.r.t. this hierarchy. Thisshould fix it.
Introduce ALGORITHM_REGISTRY in spras/config/util.py as the single
source of truth for all supported algorithms, mapping algorithm names
to (module_path, class_name) tuples. Add an AlgorithmName enum
auto-generated from the registry keys (case-insensitive via
CaseInsensitiveEnum) and a get_valid_algorithm_names() helper.

Replace the 11 manual algorithm imports and hardcoded dict in
spras/runner.py with a _load_algorithms() function that uses importlib
to load classes from the registry at import time.  get_algorithm() now
validates via AlgorithmName instead of a raw .lower() lookup.

To register a new algorithm, add one entry to ALGORITHM_REGISTRY --
no changes to runner.py are needed.
After building ProcessedContainerSettings, check that every key in
containers.images is a recognized algorithm name by passing it through
AlgorithmName.  Raise ValueError with a clear message and the list of
valid names if a typo or unknown name is found.

Add test_config_container_images_invalid_algorithm to verify the
validation rejects unrecognized keys.
Update the Step 4 walkthrough and checklist item 5 to reflect the new
algorithm registration mechanism.  Adding a new algorithm now requires
a single entry in ALGORITHM_REGISTRY in spras/config/util.py instead
of manually importing the class and editing the dict in runner.py.
Replace the two-layer design (_resolve_container_override +
_resolve_singularity_image both reading image_override) with a single
entry point resolve_container_image() that returns a frozen
ResolvedImage dataclass carrying the image string and an is_local_sif
flag.

Key changes in containers.py:
- Add ResolvedImage(image, is_local_sif) frozen dataclass
- resolve_container_image() is the single place image_override is read;
  .sif + singularity returns ResolvedImage(sif_path, True) directly
- Rename _resolve_singularity_image to _prepare_singularity_image;
  it receives ResolvedImage and never re-reads image_override. Its job
  is to prepare (e.g. unpack) the apptainer image
- run_container_singularity() takes ResolvedImage instead of str
- Add warnings for suspicious overrides (excessive path depth, bare
  hostname) with "Attempting to use as-is" messaging
Expand the containers.images comment to mention that shared-fs-usage
is configured via spras_profile, making the context clearer for users
unfamiliar with the HTCondor integration.
Adds a space to an error message, and stops some apptainer tests
from running on non-Linux systems, where `spython` imports cause
problems.
@jhiemstrawisc jhiemstrawisc force-pushed the explicit-sif-transfer branch from beabea9 to f09c88f Compare April 29, 2026 20:20
@github-actions github-actions Bot removed the merge-conflict This PR has merge conflicts. label Apr 29, 2026
@jhiemstrawisc jhiemstrawisc requested a review from agitter April 29, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants