Skip to content

feat(alphafold): accept a user-provided custom MSA input (#52)#235

Draft
Elarwei001 wants to merge 1 commit into
scverse:devfrom
Elarwei001:feature/alphafold-custom-msa-52
Draft

feat(alphafold): accept a user-provided custom MSA input (#52)#235
Elarwei001 wants to merge 1 commit into
scverse:devfrom
Elarwei001:feature/alphafold-custom-msa-52

Conversation

@Elarwei001

Copy link
Copy Markdown
Contributor

Resolves #52

Summary

gget alphafold: Added a new msa argument (-msa/--msa on the command line) that lets you provide a custom multiple sequence alignment (a3m or aligned FASTA) instead of running the internal jackhmmer search. The first sequence in the MSA must be the query. This enables folding from a manually curated MSA and skips the genetic database download entirely (currently supported for single-sequence/monomer predictions). Resolves issue 52.

Testing

Unit tests in tests/test_alphafold.py with a3m/FASTA fixtures in tests/fixtures/ (parsing/validation level, since AlphaFold's heavy deps are unavailable in CI); run with pytest.

Add an `msa` parameter (Python) / -msa, --msa flag (CLI) to gget
alphafold that lets users supply a custom multiple sequence alignment
(a3m or aligned FASTA) instead of running the internal jackhmmer search.
The first sequence in the MSA must be the query.

When a custom MSA is provided, gget skips the jackhmmer search and the
genetic-database download entirely and builds the MSA features directly
from the file. New helpers detect_msa_format()/parse_custom_msa()/
read_custom_msa() handle format detection and a3m/FASTA parsing
(lowercase a3m insertions are folded into the deletion matrix, matching
AlphaFold's own parser). Currently supported for single-sequence
(monomer) predictions; clear errors are raised otherwise. Default
behavior is unchanged (backward compatible).

Resolves scverse#52.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov-commenter

codecov-commenter commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 38.73874% with 68 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.45%. Comparing base (5cf607f) to head (22b4993).
⚠️ Report is 1 commits behind head on dev.

Files with missing lines Patch % Lines
gget/gget_alphafold.py 38.73% 68 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #235      +/-   ##
==========================================
+ Coverage   56.14%   56.45%   +0.31%     
==========================================
  Files          29       29              
  Lines        9244     9317      +73     
==========================================
+ Hits         5190     5260      +70     
- Misses       4054     4057       +3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Elarwei001 Elarwei001 marked this pull request as draft June 25, 2026 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants