OncoKB biomarker querying, Reactable output, refined CNA classifications, and RNA fusion support#293
OncoKB biomarker querying, Reactable output, refined CNA classifications, and RNA fusion support#293
Conversation
sigven
commented
Apr 21, 2026
- major upgrade/code restructure
- RNA fusions can be provided as input - simple format for input
- Ability to query OncoKB (provided the user feeds in an OncoKB token, and possibly an OncoTree code (designating the tumor type).
- OncoKB querying is performed for CNAs, SNVs/InDels, and RNA fusions
- The classification of CNA types (hetdels, homdels, LOH etc) has been subject to a complete make-over, and now the user can choose between absolute thresholds or relative (to ploidy) thresholds for classification
- Many tables in the HTML report are now displayed through Reactable, which gives a better visual experience (although some crosstalk slider functionality has been dropped - much of this is still available by filtering directly in reactable table columns)
Merge branch 'gnomad_patch' of github.com:sigven/pcgr into gnomad_patch # Conflicts: # .bumpversion.toml # .github/workflows/build_conda_recipes.yaml # conda/env/yml/pcgr.yml # conda/env/yml/pcgrr.yml # conda/env/yml/pkgdown.yml # conda/recipe/pcgr/meta.yaml # conda/recipe/pcgrr/meta.yaml # pcgr/_version.py # pcgrr/DESCRIPTION # pcgrr/data/data_coltype_defs.rda # pcgrr/data/dt_display.rda # pcgrr/data/tsv_cols.rda # pcgrr/vignettes/installation.Rmd # pyproject.toml
Merge branch 'main' into gnomad_patch # Conflicts: # conda/env/lock/pcgr-linux-64.lock # conda/env/lock/pcgr-osx-64.lock # conda/env/lock/pcgrr-linux-64.lock # conda/env/lock/pcgrr-osx-64.lock # conda/env/yml/pcgr.yml # pcgr/annoutils.py # pcgr/arg_checker.py # pcgr/biomarker.py # pcgr/cna.py # pcgr/config.py # pcgr/cpsr.py # pcgr/expression.py # pcgr/maf.py # pcgr/main.py # pcgr/mutation_hotspot.py # pcgr/oncogenicity.py # pcgr/pcgr_vars.py # pcgr/variant.py # pcgr/vcf.py # pcgr/vep.py # pcgrr/R/germline.R # pcgrr/data/data_coltype_defs.rda # pcgrr/data/dt_display.rda # pcgrr/data/tsv_cols.rda # pcgrr/man/assign_germline_popfreq_status.Rd # scripts/cpsr_validate_input.py # scripts/pcgr_summarise.py # scripts/pcgr_vcfanno.py
Review Summary by QodoOncoKB biomarker querying, multi-modal input support, refined CNA classifications, and RNA fusion integration
WalkthroughsDescription• **Major restructure with multi-modal molecular input support**: VCF is now optional; users can provide CNA, RNA fusion, and RNA expression data independently • **OncoKB biomarker integration**: Added comprehensive OncoKB querying for SNVs/InDels, CNAs, and RNA fusions with API token and OncoTree code support • **Refined CNA classification system**: Complete overhaul with ploidy-aware thresholds supporting absolute, relative, and combined modes; includes two-hit candidate detection for TSG LOH events • **RNA fusion support**: Simple input format for RNA fusions with OncoKB annotation capability • **Improved variant annotation**: Enhanced splice site impact calculations, AlphaMissense integration for oncogenicity scoring, and better protein impact assessment • **Simplified gnomAD filtering**: Replaced multiple population-specific thresholds with single gnomad_popmax_af_tolerated parameter • **Reactable table integration**: Many HTML report tables now use Reactable for improved visual experience • **Database and tool updates**: Updated to GENCODE 49, VEP 115, database version 20260417, and R minimum version 4.1 • **Code organization improvements**: Centralized vcfanno track configuration, modular CNA annotation functions, and better input validation with flexible molecular data type handling Diagramflowchart LR
VCF["VCF Input"]
CNA["CNA Input"]
RNA_F["RNA Fusion Input"]
RNA_E["RNA Expression Input"]
VCF --> SNV["SNV/InDel Analysis"]
CNA --> CNA_ANN["CNA Analysis<br/>with Ploidy Thresholds"]
RNA_F --> FUSION["RNA Fusion Analysis"]
RNA_E --> EXPR["Expression Analysis"]
SNV --> ONCOKB["OncoKB Annotation"]
CNA_ANN --> ONCOKB
FUSION --> ONCOKB
ONCOKB --> BIOMARKER["Biomarker Integration<br/>CGI, CIViC, OncoKB"]
EXPR --> BIOMARKER
BIOMARKER --> REPORT["HTML Report<br/>with Reactable Tables"]
File Changes1. pcgr/main.py
|
Code Review by Qodo
1.
|
| if not check_file_exists(oncotree_fname, logger): | ||
| warn_message( | ||
| f"OncoTree annotation file not found: {oncotree_fname} " | ||
| "- skipping OncoTree code validation", logger) | ||
| return oncotree_code |
There was a problem hiding this comment.
2. Oncotree check_file_exists misuse 🐞 Bug ☼ Reliability
verify_oncotree_code() calls check_file_exists(oncotree_fname, logger) using a positional argument that binds to strict rather than logger. If the OncoTree TSV is missing, this will raise via error_message instead of warning and continuing as intended.
Agent Prompt
### Issue description
`verify_oncotree_code()` incorrectly calls `check_file_exists()` with positional arguments, causing missing OncoTree reference data to become fatal.
### Issue Context
The code path is meant to *skip validation* when the OncoTree mapping file is absent.
### Fix Focus Areas
- pcgr/config.py[487-495]
- pcgr/utils.py[229-243]
### Suggested fix
Change the call to use keyword arguments and non-strict behavior:
```python
if not check_file_exists(oncotree_fname, strict=False, logger=logger):
warn_message(...)
return oncotree_code
```
(or replace with an `os.path.isfile` check if you don't want file-size checks here).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| conf_options['oncokb'] = { | ||
| 'api_token': str(arg_dict['oncokb_api_token']) if arg_dict['oncokb_api_token'] is not None else None, | ||
| 'oncotree_code': str(arg_dict['oncokb_oncotree_code']) if arg_dict['oncokb_oncotree_code'] is not None else None, | ||
| 'exclusive': int(arg_dict['oncokb_exclusive']), | ||
| 'data_version': None, | ||
| 'data_release_date': None, | ||
| 'api_version': None | ||
| } |
There was a problem hiding this comment.
3. Oncokb token persisted to yaml 🐞 Bug ⛨ Security
The OncoKB API token is stored in conf_options and ends up written to <sample>.conf.yaml, leaking credentials into a shareable output artifact. The code even contains a commented-out redaction block, but it is not executed.
Agent Prompt
### Issue description
The OncoKB API token is written to the output YAML config file, which can be shared/archived and exposes credentials.
### Issue Context
PCGR writes `<output_prefix>.conf.yaml` for downstream reporting. That file should not contain secrets.
### Fix Focus Areas
- pcgr/config.py[114-125]
- pcgr/main.py[990-993]
- pcgr/main.py[1019-1026]
### Suggested fix
Implement one of:
1. **Do not serialize the token**: before writing YAML, set `yaml_data['conf']['oncokb']['api_token'] = None` (or delete the key).
2. **Keep token only in-memory**: store token outside `conf_options`/`yaml_data` entirely and pass it only to the OncoKB execution step.
At minimum, uncomment/activate the existing redaction logic (but preferably redact *before* writing the file the first time).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools