Fix offsetalignment crash with pre-split (padded) target databases#8
Open
antonvnv wants to merge 1 commit into
Open
Fix offsetalignment crash with pre-split (padded) target databases#8antonvnv wants to merge 1 commit into
antonvnv wants to merge 1 commit into
Conversation
The pskvins branch hardcodes `targetNucl = true` to force target
coordinate adjustment in offsetalignment. This is correct when the
search pipeline itself splits the target via splitsequence (db3 != db4),
but incorrect when the target was pre-split by makepaddedseqdb and
passed directly as the indexed database (db3 == db4).
When db3 == db4, offsetalignment still ran the target update path:
1. Remapped res.dbKey from the padded chunk key to the original
shuffled FASTA key (via ORF header parsing)
2. Added the chunk's `from` offset to res.dbStartPos/dbEndPos
This produced invalid alignment coordinates: result2profile would then
look up the remapped key (wrong entry in the padded DB) and access
positions far beyond the chunk's actual length, causing a segfault in
MultipleAlignment::updateGapsInSequenceSet.
The bug only manifests with databases containing sequences longer than
maxSeqLen (10000), because:
- Unsplit sequences have from=0, so the position adjustment is a no-op
- The key remap happens to be harmless for 1:1 (unsplit) entries
- Only split chunks have from>0, producing inflated positions
Fix: detect pre-split targets by comparing db3 and db4 paths. When they
are identical, skip the target coordinate adjustment entirely since
chunk-relative coordinates are already correct.
Also remove the `|| qloc == NULL` fallback in updateOffset that would
bypass the targetNeedsUpdate=false guard, ensuring the fix is not
circumvented when no query ORF location is available.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The pskvins branch hardcodes
targetNucl = trueto force target coordinate adjustment in offsetalignment. This is correct when the search pipeline itself splits the target via splitsequence (db3 != db4), but incorrect when the target was pre-split by makepaddedseqdb and passed directly as the indexed database (db3 == db4).When db3 == db4, offsetalignment still ran the target update path:
fromoffset to res.dbStartPos/dbEndPosThis produced invalid alignment coordinates: result2profile would then look up the remapped key (wrong entry in the padded DB) and access positions far beyond the chunk's actual length, causing a segfault in MultipleAlignment::updateGapsInSequenceSet.
The bug only manifests with databases containing sequences longer than maxSeqLen (10000), because:
Fix: detect pre-split targets by comparing db3 and db4 paths. When they are identical, skip the target coordinate adjustment entirely since chunk-relative coordinates are already correct.
Also remove the
|| qloc == NULLfallback in updateOffset that would bypass the targetNeedsUpdate=false guard, ensuring the fix is not circumvented when no query ORF location is available.