Draft: try adding capsid variant calling and resistance#109
Draft: try adding capsid variant calling and resistance#109schorlton-bugseq wants to merge 1 commit intoPoonLab:masterfrom
Conversation
|
Hi! I will look more intensively into the questions but from what I could recall and find:
We pull the comments from https://github.com/hivdb/hivfacts/tree/main/data, which is the repository for the data found on https://hivdb.stanford.edu. The script in
I think this is a symptom of not having the necessary files to call lenacapvir from the
I have a memory that it was on the https://hivdb.stanford.edu website, but I couldn't find it after looking around. I will keep looking, I'm pretty sure it was there.
It matches the coordinates from HIVDB facts: I think we got our positions for RT, IN, PR from here |
|
Thanks so much! I forgot to include that I was using a freshly downloaded database (beyond just the XML) using that script, however when calling sierra-local/sierralocal/main.py Lines 125 to 126 in 2896b83 sierra-local/sierralocal/jsonwriter.py Lines 29 to 39 in 2896b83
The lenacapavir looked to me like other drugs in the XML (https://github.com/hivdb/hivfacts/blob/main/data/algorithms/HIVDB_9.8.xml#L2719), so I was more thinking it may be an omission of alignment/examination of capsid in general from sierralocal, although I'm really not certain at this point. Thank you both again and happy to help in your investigation too. |
|
Ah I think I see some problems. I was looking through my old commits and I directly accessed the mutation files in sierra-local/sierralocal/jsonwriter.py Lines 88 to 102 in 2896b83 and then checked for them later down the same file to be written into the json output sierra-local/sierralocal/jsonwriter.py Lines 532 to 548 in 2896b83 However, when you call the I think you're right on the data not being used. We don't have a dedicated dictionary for the capsid protein, so it's just ignored. Your changes should include it, but the issue may arise from this function, which adjusts gene positions in respects to pol: sierra-local/sierralocal/nucaminohook.py Lines 360 to 371 in 2896b83 The result of this indexes CA in negative indexes, which then gets filtered out here: sierra-local/sierralocal/nucaminohook.py Lines 373 to 413 in 2896b83 However, I checked sierrapy with a few random complete HIV sequences from NCBI nucleotide database and found that they also only picked up PR, IN, and RT. Do you have a sequence that captures CA in sierrapy? I was also hunting for the comments, and I want to say they're included in the xml files now? Probably needs more checking though |
Starting to work on #108
Thanks @ArtPoon for initial response. Still not working so @WilliamZekaiWang would definitely value your input here.
Open questions from the PR so far (apologies as I am not an HIV expert):
min_overlapthresholds borrowed from? Could not find upon a quick search of the sierra/sierrapy repos