Fix splitsequence soft-link mode crash with profile DBs by antonvnv · Pull Request #7 · pskvins/MMseqs2

antonvnv · 2026-04-29T01:40:22Z

splitsequence --sequence-split-mode 1 (soft-link) wrote position-based offsets directly as byte offsets into the index. This is correct for sequence DBs (1 byte per position) but wrong for profile DBs where each position occupies PROFILE_READIN_SIZE (27) bytes.

When blastdigp.sh runs iterative search on a long query (>10000nt), extractqueryprofiles produces a dinucleotide profile and splitsequence splits it in soft-link mode. The resulting index entries pointed to misaligned byte offsets (e.g. byte 10000 instead of 10000*27=270000), causing ungappedprefilter to read garbage profile data and segfault in Sequence::mapProfile.

The bug was latent because the conditions rarely coincided:

Target DBs use hard-copy mode (--sequence-split-mode 0), unaffected
Query profiles only get split when the query exceeds maxSeqLen
Only the pskvins nucleotide pipeline creates profile DBs that pass through splitsequence soft-link mode

Fix: multiply startPos and len by PROFILE_READIN_SIZE when writing soft-link index entries for profile DBs.

splitsequence --sequence-split-mode 1 (soft-link) wrote position-based offsets directly as byte offsets into the index. This is correct for sequence DBs (1 byte per position) but wrong for profile DBs where each position occupies PROFILE_READIN_SIZE (27) bytes. When blastdigp.sh runs iterative search on a long query (>10000nt), extractqueryprofiles produces a dinucleotide profile and splitsequence splits it in soft-link mode. The resulting index entries pointed to misaligned byte offsets (e.g. byte 10000 instead of 10000*27=270000), causing ungappedprefilter to read garbage profile data and segfault in Sequence::mapProfile. The bug was latent because the conditions rarely coincided: - Target DBs use hard-copy mode (--sequence-split-mode 0), unaffected - Query profiles only get split when the query exceeds maxSeqLen - Only the pskvins nucleotide pipeline creates profile DBs that pass through splitsequence soft-link mode Fix: multiply startPos and len by PROFILE_READIN_SIZE when writing soft-link index entries for profile DBs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix splitsequence soft-link mode crash with profile DBs#7

Fix splitsequence soft-link mode crash with profile DBs#7
antonvnv wants to merge 1 commit into
pskvins:masterfrom
antonvnv:fix-splitsequence

antonvnv commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

antonvnv commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant