Skip to content

Fix offsetalignment crash with pre-split (padded) target databases#8

Open
antonvnv wants to merge 1 commit into
pskvins:masterfrom
antonvnv:fix-offsetalignment
Open

Fix offsetalignment crash with pre-split (padded) target databases#8
antonvnv wants to merge 1 commit into
pskvins:masterfrom
antonvnv:fix-offsetalignment

Conversation

@antonvnv
Copy link
Copy Markdown

The pskvins branch hardcodes targetNucl = true to force target coordinate adjustment in offsetalignment. This is correct when the search pipeline itself splits the target via splitsequence (db3 != db4), but incorrect when the target was pre-split by makepaddedseqdb and passed directly as the indexed database (db3 == db4).

When db3 == db4, offsetalignment still ran the target update path:

  1. Remapped res.dbKey from the padded chunk key to the original shuffled FASTA key (via ORF header parsing)
  2. Added the chunk's from offset to res.dbStartPos/dbEndPos

This produced invalid alignment coordinates: result2profile would then look up the remapped key (wrong entry in the padded DB) and access positions far beyond the chunk's actual length, causing a segfault in MultipleAlignment::updateGapsInSequenceSet.

The bug only manifests with databases containing sequences longer than maxSeqLen (10000), because:

  • Unsplit sequences have from=0, so the position adjustment is a no-op
  • The key remap happens to be harmless for 1:1 (unsplit) entries
  • Only split chunks have from>0, producing inflated positions

Fix: detect pre-split targets by comparing db3 and db4 paths. When they are identical, skip the target coordinate adjustment entirely since chunk-relative coordinates are already correct.

Also remove the || qloc == NULL fallback in updateOffset that would bypass the targetNeedsUpdate=false guard, ensuring the fix is not circumvented when no query ORF location is available.

The pskvins branch hardcodes `targetNucl = true` to force target
coordinate adjustment in offsetalignment. This is correct when the
search pipeline itself splits the target via splitsequence (db3 != db4),
but incorrect when the target was pre-split by makepaddedseqdb and
passed directly as the indexed database (db3 == db4).

When db3 == db4, offsetalignment still ran the target update path:
  1. Remapped res.dbKey from the padded chunk key to the original
     shuffled FASTA key (via ORF header parsing)
  2. Added the chunk's `from` offset to res.dbStartPos/dbEndPos

This produced invalid alignment coordinates: result2profile would then
look up the remapped key (wrong entry in the padded DB) and access
positions far beyond the chunk's actual length, causing a segfault in
MultipleAlignment::updateGapsInSequenceSet.

The bug only manifests with databases containing sequences longer than
maxSeqLen (10000), because:
  - Unsplit sequences have from=0, so the position adjustment is a no-op
  - The key remap happens to be harmless for 1:1 (unsplit) entries
  - Only split chunks have from>0, producing inflated positions

Fix: detect pre-split targets by comparing db3 and db4 paths. When they
are identical, skip the target coordinate adjustment entirely since
chunk-relative coordinates are already correct.

Also remove the `|| qloc == NULL` fallback in updateOffset that would
bypass the targetNeedsUpdate=false guard, ensuring the fix is not
circumvented when no query ORF location is available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant