Fix offsetalignment crash with pre-split (padded) target databases by antonvnv · Pull Request #8 · pskvins/MMseqs2

antonvnv · 2026-04-29T01:40:43Z

The pskvins branch hardcodes targetNucl = true to force target coordinate adjustment in offsetalignment. This is correct when the search pipeline itself splits the target via splitsequence (db3 != db4), but incorrect when the target was pre-split by makepaddedseqdb and passed directly as the indexed database (db3 == db4).

When db3 == db4, offsetalignment still ran the target update path:

Remapped res.dbKey from the padded chunk key to the original shuffled FASTA key (via ORF header parsing)
Added the chunk's from offset to res.dbStartPos/dbEndPos

This produced invalid alignment coordinates: result2profile would then look up the remapped key (wrong entry in the padded DB) and access positions far beyond the chunk's actual length, causing a segfault in MultipleAlignment::updateGapsInSequenceSet.

The bug only manifests with databases containing sequences longer than maxSeqLen (10000), because:

Unsplit sequences have from=0, so the position adjustment is a no-op
The key remap happens to be harmless for 1:1 (unsplit) entries
Only split chunks have from>0, producing inflated positions

Fix: detect pre-split targets by comparing db3 and db4 paths. When they are identical, skip the target coordinate adjustment entirely since chunk-relative coordinates are already correct.

Also remove the || qloc == NULL fallback in updateOffset that would bypass the targetNeedsUpdate=false guard, ensuring the fix is not circumvented when no query ORF location is available.

The pskvins branch hardcodes `targetNucl = true` to force target coordinate adjustment in offsetalignment. This is correct when the search pipeline itself splits the target via splitsequence (db3 != db4), but incorrect when the target was pre-split by makepaddedseqdb and passed directly as the indexed database (db3 == db4). When db3 == db4, offsetalignment still ran the target update path: 1. Remapped res.dbKey from the padded chunk key to the original shuffled FASTA key (via ORF header parsing) 2. Added the chunk's `from` offset to res.dbStartPos/dbEndPos This produced invalid alignment coordinates: result2profile would then look up the remapped key (wrong entry in the padded DB) and access positions far beyond the chunk's actual length, causing a segfault in MultipleAlignment::updateGapsInSequenceSet. The bug only manifests with databases containing sequences longer than maxSeqLen (10000), because: - Unsplit sequences have from=0, so the position adjustment is a no-op - The key remap happens to be harmless for 1:1 (unsplit) entries - Only split chunks have from>0, producing inflated positions Fix: detect pre-split targets by comparing db3 and db4 paths. When they are identical, skip the target coordinate adjustment entirely since chunk-relative coordinates are already correct. Also remove the `|| qloc == NULL` fallback in updateOffset that would bypass the targetNeedsUpdate=false guard, ensuring the fix is not circumvented when no query ORF location is available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix offsetalignment crash with pre-split (padded) target databases#8

Fix offsetalignment crash with pre-split (padded) target databases#8
antonvnv wants to merge 1 commit into
pskvins:masterfrom
antonvnv:fix-offsetalignment

antonvnv commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

antonvnv commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant