Skip to content

OncoKB biomarker querying, Reactable output, refined CNA classifications, and RNA fusion support#293

Open
sigven wants to merge 57 commits intodevfrom
gnomad_patch
Open

OncoKB biomarker querying, Reactable output, refined CNA classifications, and RNA fusion support#293
sigven wants to merge 57 commits intodevfrom
gnomad_patch

Conversation

@sigven
Copy link
Copy Markdown
Owner

@sigven sigven commented Apr 21, 2026

  • major upgrade/code restructure
    • RNA fusions can be provided as input - simple format for input
    • Ability to query OncoKB (provided the user feeds in an OncoKB token, and possibly an OncoTree code (designating the tumor type).
    • OncoKB querying is performed for CNAs, SNVs/InDels, and RNA fusions
    • The classification of CNA types (hetdels, homdels, LOH etc) has been subject to a complete make-over, and now the user can choose between absolute thresholds or relative (to ploidy) thresholds for classification
    • Many tables in the HTML report are now displayed through Reactable, which gives a better visual experience (although some crosstalk slider functionality has been dropped - much of this is still available by filtering directly in reactable table columns)

sigven and others added 30 commits May 12, 2025 15:03
Merge branch 'gnomad_patch' of github.com:sigven/pcgr into gnomad_patch

# Conflicts:
#	.bumpversion.toml
#	.github/workflows/build_conda_recipes.yaml
#	conda/env/yml/pcgr.yml
#	conda/env/yml/pcgrr.yml
#	conda/env/yml/pkgdown.yml
#	conda/recipe/pcgr/meta.yaml
#	conda/recipe/pcgrr/meta.yaml
#	pcgr/_version.py
#	pcgrr/DESCRIPTION
#	pcgrr/data/data_coltype_defs.rda
#	pcgrr/data/dt_display.rda
#	pcgrr/data/tsv_cols.rda
#	pcgrr/vignettes/installation.Rmd
#	pyproject.toml
Merge branch 'main' into gnomad_patch

# Conflicts:
#	conda/env/lock/pcgr-linux-64.lock
#	conda/env/lock/pcgr-osx-64.lock
#	conda/env/lock/pcgrr-linux-64.lock
#	conda/env/lock/pcgrr-osx-64.lock
#	conda/env/yml/pcgr.yml
#	pcgr/annoutils.py
#	pcgr/arg_checker.py
#	pcgr/biomarker.py
#	pcgr/cna.py
#	pcgr/config.py
#	pcgr/cpsr.py
#	pcgr/expression.py
#	pcgr/maf.py
#	pcgr/main.py
#	pcgr/mutation_hotspot.py
#	pcgr/oncogenicity.py
#	pcgr/pcgr_vars.py
#	pcgr/variant.py
#	pcgr/vcf.py
#	pcgr/vep.py
#	pcgrr/R/germline.R
#	pcgrr/data/data_coltype_defs.rda
#	pcgrr/data/dt_display.rda
#	pcgrr/data/tsv_cols.rda
#	pcgrr/man/assign_germline_popfreq_status.Rd
#	scripts/cpsr_validate_input.py
#	scripts/pcgr_summarise.py
#	scripts/pcgr_vcfanno.py
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

OncoKB biomarker querying, multi-modal input support, refined CNA classifications, and RNA fusion integration

✨ Enhancement

Grey Divider

Walkthroughs

Description
• **Major restructure with multi-modal molecular input support**: VCF is now optional; users can
  provide CNA, RNA fusion, and RNA expression data independently
• **OncoKB biomarker integration**: Added comprehensive OncoKB querying for SNVs/InDels, CNAs, and
  RNA fusions with API token and OncoTree code support
• **Refined CNA classification system**: Complete overhaul with ploidy-aware thresholds supporting
  absolute, relative, and combined modes; includes two-hit candidate detection for TSG LOH events
• **RNA fusion support**: Simple input format for RNA fusions with OncoKB annotation capability
• **Improved variant annotation**: Enhanced splice site impact calculations, AlphaMissense
  integration for oncogenicity scoring, and better protein impact assessment
• **Simplified gnomAD filtering**: Replaced multiple population-specific thresholds with single
  gnomad_popmax_af_tolerated parameter
• **Reactable table integration**: Many HTML report tables now use Reactable for improved visual
  experience
• **Database and tool updates**: Updated to GENCODE 49, VEP 115, database version 20260417, and R
  minimum version 4.1
• **Code organization improvements**: Centralized vcfanno track configuration, modular CNA
  annotation functions, and better input validation with flexible molecular data type handling
Diagram
flowchart LR
  VCF["VCF Input"]
  CNA["CNA Input"]
  RNA_F["RNA Fusion Input"]
  RNA_E["RNA Expression Input"]
  
  VCF --> SNV["SNV/InDel Analysis"]
  CNA --> CNA_ANN["CNA Analysis<br/>with Ploidy Thresholds"]
  RNA_F --> FUSION["RNA Fusion Analysis"]
  RNA_E --> EXPR["Expression Analysis"]
  
  SNV --> ONCOKB["OncoKB Annotation"]
  CNA_ANN --> ONCOKB
  FUSION --> ONCOKB
  
  ONCOKB --> BIOMARKER["Biomarker Integration<br/>CGI, CIViC, OncoKB"]
  EXPR --> BIOMARKER
  
  BIOMARKER --> REPORT["HTML Report<br/>with Reactable Tables"]
Loading

Grey Divider

File Changes

1. pcgr/main.py ✨ Enhancement +659/-335

Major restructure: multi-modal input support and OncoKB integration

• Restructured argument parser to support multiple optional molecular input types (VCF, CNA, RNA
 fusion, RNA expression) instead of requiring VCF
• Added OncoKB biomarker querying support with API token and oncotree code parameters
• Reorganized workflow sections with improved logging and section naming (e.g., "SNV/INDEL ANALYSIS
 SECTION", "CNA ANALYSIS SECTION")
• Integrated OncoKB annotation functions for SNVs/InDels, CNAs, and RNA fusions with two-hit
 candidate detection
• Refactored temporary file definitions and YAML configuration generation to support all molecular
 data types
• Added CNA threshold mode configuration (absolute, relative, combined) with separate thresholds for
 amplifications, gains, and deletions
• Simplified gnomAD MAF filtering to single gnomad_popmax_af_tolerated parameter replacing
 multiple population-specific thresholds

pcgr/main.py


2. pcgr/oncogenicity.py ✨ Enhancement +258/-70

OncoKB refinement and AlphaMissense integration for oncogenicity

• Added support for AlphaMissense predictions in oncogenicity scoring (ONCG_OP1 and ONCG_SBP1
 criteria)
• Refactored gnomAD population frequency checking to handle both exome and genome datasets
 separately
• Extracted oncogenicity score-to-classification logic into reusable
 _classify_oncogenicity_score() function
• Added new refine_oncogenicity_with_oncokb() function for second-pass refinement using OncoKB
 annotations
• Introduced _sort_oncogenicity_codes() helper for canonical ordering of evidence codes
• Enhanced robustness with safer VCF record INFO field access and improved handling of missing/null
 values

pcgr/oncogenicity.py


3. pcgr/utils.py ✨ Enhancement +51/-11

Utility functions for CSV export and DNA operations

• Added pd_to_csv() utility function for consistent DataFrame-to-CSV export with NaN/empty string
 handling
• Fixed logger initialization to prevent duplicate handlers when logger already exists
• Added reverse_complement_dna() utility function for DNA sequence operations

pcgr/utils.py


View more (117)
4. pcgr/mutation_hotspot.py Formatting +2/-2

Minor cleanup of unused variable

• Commented out unused principal_entrezgene variable assignment

pcgr/mutation_hotspot.py


5. pcgr/vcf.py Formatting +1/-0

Minor formatting adjustment

• Added blank line after function definition for formatting consistency

pcgr/vcf.py


6. pcgrr/man/plot_filtering_stats_germline.Rd 📝 Documentation +2/-241

Documentation update for germline filtering plot

• Updated function title from "UpSet plot" description to "pie chart for germline filtering
 statistics"

pcgrr/man/plot_filtering_stats_germline.Rd


7. conda/recipe/pcgrr/meta.yaml ⚙️ Configuration changes +1/-1

Version downgrade in conda recipe

• Downgraded version from 2.2.5 to 2.2.2

conda/recipe/pcgrr/meta.yaml


8. pcgr/cna.py ✨ Enhancement +1071/-172

Major CNA annotation refactor with ploidy-aware thresholds and OncoKB integration

• Added comprehensive ploidy estimation function using weighted median approach with focal
 amplification filtering
• Implemented OncoKB integration for CNAs and RNA fusions with input generation and annotation
 merging functions
• Refactored CNA classification with ploidy-aware thresholds supporting absolute, relative, and
 combined modes
• Added two-hit candidate annotation for TSG LOH events with somatic and germline variant matching
• Introduced modular annotation functions for amplifications, gains, deletions, LOH, and variant
 class assignment
• Enhanced transcript annotation with deduplication logic for multi-segment transcripts and
 Entrezgene mapping

pcgr/cna.py


9. scripts/validate_input_cpsr.py ✨ Enhancement +178/-142

Add germline structural variant validation support to CPSR

• Added new validate_cpsr_sv_input() function to validate germline structural variant VCF files
• Implemented SV-specific checks for SVTYPE INFO tag, END/SVLEN presence, and symbolic allele
 validation
• Refactored simplify_vcf() to use new helper functions bgzip_sort_filter_vcf() and
 detect_multiallelic_sites()
• Updated validate_cpsr_input() signature to accept SV VCF input parameters and call SV validation
• Simplified temporary file management and improved code organization

scripts/validate_input_cpsr.py


10. pcgr/pcgr_vars.py ⚙️ Configuration changes +156/-47

Update database versions and add OncoKB integration constants

• Updated database version to 20260417 and GENCODE version to 49 for GRCh38
• Updated VEP version from 113 to 115
• Added NA_INTEGER and NA_FLOAT constants for missing numeric annotations
• Expanded GnomAD allele frequency tags to distinguish exome vs genome populations
• Added SITE_TO_ONCOTREE mapping for OncoTree cancer-type codes with tissue-specific codes
• Added OncoKB-related constants: ONCOKB_COLS, VARIANT_CLASS_TO_OKB_CNA, and OncoKB
 MAF/fusion/CNA column requirements
• Added oncogenicity scoring thresholds and VICC/ClinGen criteria with OKB refinement rules
• Reorganized and alphabetized DBNSFP_ALGORITHMS mapping

pcgr/pcgr_vars.py


11. pcgrr/DESCRIPTION Dependencies +2/-3

Update R package dependencies and minimum version

• Removed rrapply from Imports dependencies
• Updated R minimum version requirement from 4.0 to 4.1
• Updated RoxygenNote from 7.3.2 to 7.3.3

pcgrr/DESCRIPTION


12. pcgrr/R/constants.R ⚙️ Configuration changes +1/-0

Add HTML report variant limit constant

• Added new constant MAX_VARS_ALLOWED_HTML set to 750000 for HTML report generation limits

pcgrr/R/constants.R


13. pcgr/arg_checker.py ✨ Enhancement +490/-206

CNA threshold validation and OncoKB integration with flexible input handling

• Added comprehensive validate_cna_thresholds() function to validate CNA amplification, gain, and
 deletion thresholds with support for absolute, relative, and combined threshold modes
• Refactored verify_args() to accept optional logger parameter and added OncoKB token validation
 (UUID format check) and exclusive mode validation
• Added VCF-dependent validation block that only executes when input_vcf is provided, improving
 modularity
• Enhanced input file validation to accept multiple molecular input types (VCF, CNA, RNA fusion, RNA
 expression) instead of requiring VCF
• Added RNA fusion minimum split reads validation and updated CNA overlap parameter naming from
 cna_overlap_pct to cna_transcript_overlap_pct

pcgr/arg_checker.py


14. pcgr/annoutils.py ✨ Enhancement +154/-47

Enhanced variant annotation with improved splice site and protein impact assessment

• Added get_vcfanno_tracks() function to centralize vcfanno track configuration and file path
 management
• Moved reverse_complement_dna() import from pcgr.variant to pcgr.utils module
• Enhanced assign_cds_exon_intron_annotations() with improved MaxEntScan splice site impact
 calculations using percentage change instead of absolute difference
• Added new annotation fields including EXON_INTRON_JUNCTION_SPAN, CODON,
 PROTEIN_RELATIVE_POSITION, and improved intron position extraction logic
• Refined loss-of-function filtering for splice donor/acceptor variants with stricter criteria and
 adjusted CDS end truncation threshold from 5% to 2.5%

pcgr/annoutils.py


15. pcgr/biomarker.py ✨ Enhancement +354/-11

OncoKB integration and fusion biomarker support

• Added load_all_biomarkers() function to load biomarkers from multiple databases (CGI, CIViC)
 with support for MUT, CNA, and FUSION variant types
• Implemented OncoKB integration functions: validate_oncokb_input_file(),
 filter_oncokb_output(), fetch_oncokb_cancer_genes(), and run_oncokb_annotator()
• Enhanced load_biomarkers() to handle FUSION variant types and improved primary site
 classification logic
• Added OncoKB annotator wrapper that supports MAF, fusion, and CNA annotation with optional
 OncoTree code and pre-filtering to cancer genes

pcgr/biomarker.py


16. pcgr/maf.py ✨ Enhancement +352/-7

OncoKB MAF integration and variant identifier tracking

• Added append_oncokb_snv_annotations() function to merge OncoKB SNV/InDel annotations into final
 PCGR TSV using VAR_ID as join key
• Implemented add_var_id_to_vcf() to stamp VAR_ID INFO field (CHROM_POS_REF_ALT format) onto
 VEP-annotated VCF records
• Added construct_hgvsg() utility function to generate HGVSg notation from variant coordinates and
 alleles
• Implemented generate_oncokb_maf_input() to create OncoKB-compatible MAF files with required
 columns and filtering of non-coding variants

pcgr/maf.py


17. scripts/pcgr_vcfanno.py ✨ Enhancement +8/-24

Vcfanno track configuration refactoring

• Refactored vcfanno track configuration to use new get_vcfanno_tracks() function from annoutils
 module
• Removed hardcoded track file path definitions and replaced with centralized function call for
 better maintainability

scripts/pcgr_vcfanno.py


18. pcgrr/inst/templates/pcgr_bibliography.bib 📝 Documentation +26/-0

OncoKB bibliography reference addition

• Added OncoKB citation (Chakravarty et al. 2017) to bibliography for reference in documentation

pcgrr/inst/templates/pcgr_bibliography.bib


19. conda/env/yml/pcgrr.yml Dependencies +1/-1

R package version adjustment

• Downgraded r-pcgrr package version from 2.2.5 to 2.2.2

conda/env/yml/pcgrr.yml


20. .github/workflows/build_conda_recipes.yaml Additional files +1/-1

...

.github/workflows/build_conda_recipes.yaml


21. conda/env/yml/pcgr.yml Additional files +2/-2

...

conda/env/yml/pcgr.yml


22. conda/env/yml/pkgdown.yml Additional files +1/-1

...

conda/env/yml/pkgdown.yml


23. conda/recipe/pcgr/meta.yaml Additional files +1/-1

...

conda/recipe/pcgr/meta.yaml


24. pcgr/config.py Additional files +133/-36

...

pcgr/config.py


25. pcgr/cpsr.py Additional files +44/-9

...

pcgr/cpsr.py


26. pcgr/dbnsfp.py Additional files +0/-2

...

pcgr/dbnsfp.py


27. pcgr/expression.py Additional files +27/-18

...

pcgr/expression.py


28. pcgr/validate.py Additional files +342/-0

...

pcgr/validate.py


29. pcgr/variant.py Additional files +123/-69

...

pcgr/variant.py


30. pcgr/vep.py Additional files +60/-26

...

pcgr/vep.py


31. pcgrr/NAMESPACE Additional files +76/-17

...

pcgrr/NAMESPACE


32. pcgrr/R/biomarkers.R Additional files +1476/-1242

...

pcgrr/R/biomarkers.R


33. pcgrr/R/cna.R Additional files +944/-76

...

pcgrr/R/cna.R


34. pcgrr/R/data.R Additional files +17/-7

...

pcgrr/R/data.R


35. pcgrr/R/documentation.R Additional files +154/-0

...

pcgrr/R/documentation.R


36. pcgrr/R/expression.R Additional files +12/-12

...

pcgrr/R/expression.R


37. pcgrr/R/fusion.R Additional files +395/-0

...

pcgrr/R/fusion.R


38. pcgrr/R/germline.R Additional files +235/-296

...

pcgrr/R/germline.R


39. pcgrr/R/input_data.R Additional files +1201/-436

...

pcgrr/R/input_data.R


40. pcgrr/R/maf.R Additional files +10/-10

...

pcgrr/R/maf.R


41. pcgrr/R/main.R Additional files +95/-79

...

pcgrr/R/main.R


42. pcgrr/R/msi.R Additional files +5/-4

...

pcgrr/R/msi.R


43. pcgrr/R/mutation.R Additional files +305/-7

...

pcgrr/R/mutation.R


44. pcgrr/R/mutational_burden.R Additional files +5/-5

...

pcgrr/R/mutational_burden.R


45. pcgrr/R/mutational_signatures.R Additional files +52/-37

...

pcgrr/R/mutational_signatures.R


46. pcgrr/R/oncokb.R Additional files +1511/-0

...

pcgrr/R/oncokb.R


47. pcgrr/R/output_data.R Additional files +279/-110

...

pcgrr/R/output_data.R


48. pcgrr/R/reference_data.R Additional files +129/-232

...

pcgrr/R/reference_data.R


49. pcgrr/R/render_table.R Additional files +1557/-0

...

pcgrr/R/render_table.R


50. pcgrr/R/report.R Additional files +131/-116

...

pcgrr/R/report.R


51. pcgrr/R/utils.R Additional files +47/-396

...

pcgrr/R/utils.R


52. pcgrr/R/variant_annotation.R Additional files +328/-36

...

pcgrr/R/variant_annotation.R


53. pcgrr/R/variant_classification.R Additional files +1469/-263

...

pcgrr/R/variant_classification.R


54. pcgrr/R/variant_stats.R Additional files +1009/-6

...

pcgrr/R/variant_stats.R


55. pcgrr/data-raw/data-raw.R Additional files +458/-308

...

pcgrr/data-raw/data-raw.R


56. pcgrr/data/biomarker_evidence.rda Additional files +0/-0

...

pcgrr/data/biomarker_evidence.rda


57. pcgrr/data/bm_categories.rda Additional files +0/-0

...

pcgrr/data/bm_categories.rda


58. pcgrr/data/bm_evidence.rda Additional files +0/-0

...

pcgrr/data/bm_evidence.rda


59. pcgrr/data/cancer_phenotypes_regex.rda Additional files +0/-0

...

pcgrr/data/cancer_phenotypes_regex.rda


60. pcgrr/data/color_palette.rda Additional files +0/-0

...

pcgrr/data/color_palette.rda


61. pcgrr/data/data_coltype_defs.rda Additional files +0/-0

...

pcgrr/data/data_coltype_defs.rda


62. pcgrr/data/dt_display.rda Additional files +0/-0

...

pcgrr/data/dt_display.rda


63. pcgrr/data/effect_prediction_algos.rda Additional files +0/-0

...

pcgrr/data/effect_prediction_algos.rda


64. pcgrr/data/oncogenicity_criteria.rda Additional files +0/-0

...

pcgrr/data/oncogenicity_criteria.rda


65. pcgrr/data/oncokb_base_api_url.rda Additional files +0/-0

...

pcgrr/data/oncokb_base_api_url.rda


66. pcgrr/data/table_display_cols.rda Additional files +0/-0

...

pcgrr/data/table_display_cols.rda


67. pcgrr/data/tcga_cohorts.rda Additional files +0/-0

...

pcgrr/data/tcga_cohorts.rda


68. pcgrr/data/tsv_cols.rda Additional files +0/-0

...

pcgrr/data/tsv_cols.rda


69. pcgrr/data/tumor_sites.rda Additional files +0/-0

...

pcgrr/data/tumor_sites.rda


70. pcgrr/data/variant_db_url.rda Additional files +0/-0

...

pcgrr/data/variant_db_url.rda


71. pcgrr/inst/templates/doc_notes_md/actionability.md Additional files +34/-0

...

pcgrr/inst/templates/doc_notes_md/actionability.md


72. pcgrr/inst/templates/doc_notes_md/expression.md Additional files +9/-0

...

pcgrr/inst/templates/doc_notes_md/expression.md


73. pcgrr/inst/templates/doc_notes_md/lof.md Additional files +34/-0

...

pcgrr/inst/templates/doc_notes_md/lof.md


74. pcgrr/inst/templates/doc_notes_md/msi.md Additional files +17/-0

...

pcgrr/inst/templates/doc_notes_md/msi.md


75. pcgrr/inst/templates/doc_notes_md/mutational_signatures.md Additional files +16/-0

...

pcgrr/inst/templates/doc_notes_md/mutational_signatures.md


76. pcgrr/inst/templates/doc_notes_md/oncogenicity.md Additional files +27/-0

...

pcgrr/inst/templates/doc_notes_md/oncogenicity.md


77. pcgrr/inst/templates/doc_notes_md/tmb.md Additional files +24/-0

...

pcgrr/inst/templates/doc_notes_md/tmb.md


78. pcgrr/inst/templates/pcgr_quarto.css Additional files +170/-0

...

pcgrr/inst/templates/pcgr_quarto.css


79. pcgrr/inst/templates/pcgr_quarto_report.qmd Additional files +26/-24

...

pcgrr/inst/templates/pcgr_quarto_report.qmd


80. pcgrr/inst/templates/pcgr_quarto_report/cna.qmd Additional files +469/-311

...

pcgrr/inst/templates/pcgr_quarto_report/cna.qmd


81. pcgrr/inst/templates/pcgr_quarto_report/documentation.qmd Additional files +36/-66

...

pcgrr/inst/templates/pcgr_quarto_report/documentation.qmd


82. pcgrr/inst/templates/pcgr_quarto_report/expression.qmd Additional files +3/-3

...

pcgrr/inst/templates/pcgr_quarto_report/expression.qmd


83. pcgrr/inst/templates/pcgr_quarto_report/expression/expression_outliers.qmd Additional files +127/-133

...

pcgrr/inst/templates/pcgr_quarto_report/expression/expression_outliers.qmd


84. pcgrr/inst/templates/pcgr_quarto_report/expression/expression_similarity.qmd Additional files +3/-3

...

pcgrr/inst/templates/pcgr_quarto_report/expression/expression_similarity.qmd


85. pcgrr/inst/templates/pcgr_quarto_report/expression/immune_contexture.qmd Additional files +1/-1

...

pcgrr/inst/templates/pcgr_quarto_report/expression/immune_contexture.qmd


86. pcgrr/inst/templates/pcgr_quarto_report/germline.qmd Additional files +5/-5

...

pcgrr/inst/templates/pcgr_quarto_report/germline.qmd


87. pcgrr/inst/templates/pcgr_quarto_report/kataegis.qmd Additional files +52/-16

...

pcgrr/inst/templates/pcgr_quarto_report/kataegis.qmd


88. pcgrr/inst/templates/pcgr_quarto_report/msi.qmd Additional files +9/-9

...

pcgrr/inst/templates/pcgr_quarto_report/msi.qmd


89. pcgrr/inst/templates/pcgr_quarto_report/mutational_burden.qmd Additional files +7/-6

...

pcgrr/inst/templates/pcgr_quarto_report/mutational_burden.qmd


90. pcgrr/inst/templates/pcgr_quarto_report/mutational_signature.qmd Additional files +19/-19

...

pcgrr/inst/templates/pcgr_quarto_report/mutational_signature.qmd


91. pcgrr/inst/templates/pcgr_quarto_report/mutational_signatures/mutational_spectra.qmd Additional files +3/-3

...

pcgrr/inst/templates/pcgr_quarto_report/mutational_signatures/mutational_spectra.qmd


92. pcgrr/inst/templates/pcgr_quarto_report/mutational_signatures/signature_similarity.qmd Additional files +58/-36

...

pcgrr/inst/templates/pcgr_quarto_report/mutational_signatures/signature_similarity.qmd


93. pcgrr/inst/templates/pcgr_quarto_report/rna_fusion.qmd Additional files +324/-0

...

pcgrr/inst/templates/pcgr_quarto_report/rna_fusion.qmd


94. pcgrr/inst/templates/pcgr_quarto_report/settings.qmd Additional files +172/-166

...

pcgrr/inst/templates/pcgr_quarto_report/settings.qmd


95. pcgrr/inst/templates/pcgr_quarto_report/snv_indel.qmd Additional files +14/-12

...

pcgrr/inst/templates/pcgr_quarto_report/snv_indel.qmd


96. pcgrr/inst/templates/pcgr_quarto_report/snv_indel/actionability.qmd Additional files +178/-408

...

pcgrr/inst/templates/pcgr_quarto_report/snv_indel/actionability.qmd


97. pcgrr/inst/templates/pcgr_quarto_report/snv_indel/oncogenicity.qmd Additional files +211/-250

...

pcgrr/inst/templates/pcgr_quarto_report/snv_indel/oncogenicity.qmd


98. pcgrr/inst/templates/pcgr_quarto_report/snv_indel/variant_filtering.qmd Additional files +2/-2

...

pcgrr/inst/templates/pcgr_quarto_report/snv_indel/variant_filtering.qmd


99. pcgrr/inst/templates/pcgr_quarto_report/snv_indel/variant_statistics.qmd Additional files +3/-3

...

pcgrr/inst/templates/pcgr_quarto_report/snv_indel/variant_statistics.qmd


100. pcgrr/inst/templates/pcgr_quarto_report/tumor_only_statistics.qmd Additional files +9/-9

...

pcgrr/inst/templates/pcgr_quarto_report/tumor_only_statistics.qmd


101. pcgrr/man/actionability_doc_note.Rd Additional files +14/-0

...

pcgrr/man/actionability_doc_note.Rd


102. pcgrr/man/append_alteration_name.Rd Additional files +21/-0

...

pcgrr/man/append_alteration_name.Rd


103. pcgrr/man/append_protein_domains.Rd Additional files +21/-0

...

pcgrr/man/append_protein_domains.Rd


104. pcgrr/man/append_styled_cna_vclass.Rd Additional files +23/-0

...

pcgrr/man/append_styled_cna_vclass.Rd


105. pcgrr/man/assign_amp_asco_cap_tiers.Rd Additional files +47/-0

...

pcgrr/man/assign_amp_asco_cap_tiers.Rd


106. pcgrr/man/assign_amp_asco_tiers.Rd Additional files +0/-29

...

pcgrr/man/assign_amp_asco_tiers.Rd


107. pcgrr/man/assign_bm_tier_support_ttagnostic.Rd Additional files +35/-0

...

pcgrr/man/assign_bm_tier_support_ttagnostic.Rd


108. pcgrr/man/assign_bm_tier_support_ttspecific.Rd Additional files +39/-0

...

pcgrr/man/assign_bm_tier_support_ttspecific.Rd


109. pcgrr/man/assign_germline_popfreq_status.Rd Additional files +5/-17

...

pcgrr/man/assign_germline_popfreq_status.Rd


110. pcgrr/man/assign_germline_popfreq_status_old.Rd Additional files +33/-0

...

pcgrr/man/assign_germline_popfreq_status_old.Rd


111. pcgrr/man/assign_variant_tiers_cna.Rd Additional files +42/-0

...

pcgrr/man/assign_variant_tiers_cna.Rd


112. pcgrr/man/assign_variant_tiers_fusion.Rd Additional files +41/-0

...

pcgrr/man/assign_variant_tiers_fusion.Rd


113. pcgrr/man/assign_variant_tiers_snv_indel.Rd Additional files +42/-0

...

pcgrr/man/assign_variant_tiers_snv_indel.Rd


114. pcgrr/man/assign_variant_top_tiers_ttagnostic.Rd Additional files +32/-0

...

pcgrr/man/assign_variant_top_tiers_ttagnostic.Rd


115. pcgrr/man/assign_variant_top_tiers_ttspecific.Rd Additional files +35/-0

...

pcgrr/man/assign_variant_top_tiers_ttspecific.Rd


116. pcgrr/man/bm_evidence.Rd Additional files +4/-4

...

pcgrr/man/bm_evidence.Rd


117. pcgrr/man/bp_junction_transcript_overlap.Rd Additional files +33/-0

...

pcgrr/man/bp_junction_transcript_overlap.Rd


118. pcgrr/man/build_oncogenicity_col_defs.Rd Additional files +34/-0

...

pcgrr/man/build_oncogenicity_col_defs.Rd


119. pcgrr/man/build_rt_row_details.Rd Additional files +35/-0

...

pcgrr/man/build_rt_row_details.Rd


120. Additional files not shown Additional files +0/-0

...

Additional files not shown


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Apr 21, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. OncoKB run KeyError🐞 Bug ≡ Correctness
Description
run_pcgr() reads conf_options['oncokb']['run'] unconditionally, but create_config() only sets
'run' when an OncoKB token is provided. Running without --oncokb_api_token will raise `KeyError:
'run'` and abort the workflow.
Code

pcgr/main.py[R378-383]

+    if conf_options['oncokb']['run'] == 1:
+        ## i want the output, not the input, to be in the YAML file, since the output is what will be used for the oncokb annotation and downstream analyses
+        conf_options['molecular_data']['fname_oncokb_output_maf_hgvsg'] = oncokb_output_maf_hgvsg
+        conf_options['molecular_data']['fname_oncokb_output_maf_hgvsp'] = oncokb_output_maf_hgvsp
+        conf_options['molecular_data']['fname_oncokb_output_fusions'] = oncokb_output_fusion_tsv
+        conf_options['molecular_data']['fname_oncokb_output_cna'] = oncokb_output_cna_tsv
Evidence
In pcgr/main.py, the pipeline branches on conf_options['oncokb']['run'] without guarding for
missing keys. In pcgr/config.py, the oncokb dict is created without a run key unless
api_token is not None, so the common case (no token) will crash before YAML
generation/validation completes.

pcgr/main.py[353-389]
pcgr/config.py[114-126]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
PCGR crashes when `--oncokb_api_token` is not provided because `conf_options['oncokb']['run']` is accessed in `pcgr/main.py` but the `run` key is only added conditionally in `pcgr/config.py`.
### Issue Context
`oncokb` is optional; the workflow must run without any token.
### Fix Focus Areas
- pcgr/config.py[114-126]
- pcgr/main.py[353-383]
### Suggested fix
1. In `create_config()`, always set `conf_options['oncokb']['run'] = 0` during initialization (and then set to `1` when token is present).
2. Make `pcgr/main.py` more defensive (e.g., `if conf_options.get('oncokb', {}).get('run') == 1:`) to avoid similar crashes if config shape changes again.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. OncoTree check_file_exists misuse 🐞 Bug ☼ Reliability
Description
verify_oncotree_code() calls check_file_exists(oncotree_fname, logger) using a positional
argument that binds to strict rather than logger. If the OncoTree TSV is missing, this will
raise via error_message instead of warning and continuing as intended.
Code

pcgr/config.py[R490-494]

+    if not check_file_exists(oncotree_fname, logger):
+        warn_message(
+            f"OncoTree annotation file not found: {oncotree_fname} "
+            "- skipping OncoTree code validation", logger)
+        return oncotree_code
Evidence
check_file_exists is defined as (fname, strict=True, logger=None). Passing logger as the
second positional argument sets strict= (truthy) and leaves logger=None, so a missing file
triggers a hard error instead of the non-fatal warning path verify_oncotree_code() is trying to
implement.

pcgr/config.py[482-495]
pcgr/utils.py[229-243]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`verify_oncotree_code()` incorrectly calls `check_file_exists()` with positional arguments, causing missing OncoTree reference data to become fatal.
### Issue Context
The code path is meant to *skip validation* when the OncoTree mapping file is absent.
### Fix Focus Areas
- pcgr/config.py[487-495]
- pcgr/utils.py[229-243]
### Suggested fix
Change the call to use keyword arguments and non-strict behavior:

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. OncoKB token persisted to YAML 🐞 Bug ⛨ Security
Description
The OncoKB API token is stored in conf_options and ends up written to .conf.yaml, leaking
credentials into a shareable output artifact. The code even contains a commented-out redaction
block, but it is not executed.
Code

pcgr/config.py[R114-121]

+        conf_options['oncokb'] = {
+            'api_token': str(arg_dict['oncokb_api_token']) if arg_dict['oncokb_api_token'] is not None else None,
+            'oncotree_code': str(arg_dict['oncokb_oncotree_code']) if arg_dict['oncokb_oncotree_code'] is not None else None,
+            'exclusive': int(arg_dict['oncokb_exclusive']),
+            'data_version': None,
+            'data_release_date': None,
+            'api_version': None
+        }
Evidence
pcgr/config.py stores the bearer token as conf_options['oncokb']['api_token']. Later,
pcgr/main.py writes yaml_data to disk via yaml.dump(yaml_data); yaml_data is derived from
conf_options (populate_config_data(conf_options, ...)), so the token will be serialized unless
explicitly removed/redacted.

pcgr/config.py[114-125]
pcgr/main.py[990-993]
pcgr/main.py[1019-1026]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The OncoKB API token is written to the output YAML config file, which can be shared/archived and exposes credentials.
### Issue Context
PCGR writes `<output_prefix>.conf.yaml` for downstream reporting. That file should not contain secrets.
### Fix Focus Areas
- pcgr/config.py[114-125]
- pcgr/main.py[990-993]
- pcgr/main.py[1019-1026]
### Suggested fix
Implement one of:
1. **Do not serialize the token**: before writing YAML, set `yaml_data['conf']['oncokb']['api_token'] = None` (or delete the key).
2. **Keep token only in-memory**: store token outside `conf_options`/`yaml_data` entirely and pass it only to the OncoKB execution step.
At minimum, uncomment/activate the existing redaction logic (but preferably redact *before* writing the file the first time).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. OncoKB token in argv 🐞 Bug ⛨ Security
Description
run_oncokb_annotator() passes the OncoKB bearer token as a -b  subprocess argument to the
annotator scripts. Command-line arguments are typically visible via process listings on multi-user
systems, exposing the token.
Code

pcgr/biomarker.py[R653-660]

+            cmd = [
+               script,
+               "-i", input_path,
+               "-o", output_path,
+               "-b", oncokb_token,
+               "-q", query_type,
+               "-r", build,
+               "-d"
Evidence
The command array for the annotator scripts includes "-b", oncokb_token, and is executed via
subprocess.run(...). This puts the secret in the child process argv for the lifetime of the
process.

pcgr/biomarker.py[653-671]
pcgr/biomarker.py[692-710]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
OncoKB token is exposed in subprocess argv via `-b <token>`.
### Issue Context
On shared hosts (HPC/VMs), other users can often read process arguments.
### Fix Focus Areas
- pcgr/biomarker.py[653-671]
- pcgr/biomarker.py[692-710]
### Suggested fix
Prefer a secret channel that does not appear in argv:
- Pass the token via an environment variable (e.g. `ONCOKB_TOKEN`) using `subprocess.run(..., env=...)`.
- Update the annotator invocation to read the token from env (or, if the annotator supports it, from stdin / a protected temp file with `0600` permissions) rather than `-b`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread pcgr/main.py
Comment thread pcgr/config.py
Comment on lines +490 to +494
if not check_file_exists(oncotree_fname, logger):
warn_message(
f"OncoTree annotation file not found: {oncotree_fname} "
"- skipping OncoTree code validation", logger)
return oncotree_code
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Oncotree check_file_exists misuse 🐞 Bug ☼ Reliability

verify_oncotree_code() calls check_file_exists(oncotree_fname, logger) using a positional
argument that binds to strict rather than logger. If the OncoTree TSV is missing, this will
raise via error_message instead of warning and continuing as intended.
Agent Prompt
### Issue description
`verify_oncotree_code()` incorrectly calls `check_file_exists()` with positional arguments, causing missing OncoTree reference data to become fatal.

### Issue Context
The code path is meant to *skip validation* when the OncoTree mapping file is absent.

### Fix Focus Areas
- pcgr/config.py[487-495]
- pcgr/utils.py[229-243]

### Suggested fix
Change the call to use keyword arguments and non-strict behavior:
```python
if not check_file_exists(oncotree_fname, strict=False, logger=logger):
    warn_message(...)
    return oncotree_code
```
(or replace with an `os.path.isfile` check if you don't want file-size checks here).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment thread pcgr/config.py
Comment on lines +114 to +121
conf_options['oncokb'] = {
'api_token': str(arg_dict['oncokb_api_token']) if arg_dict['oncokb_api_token'] is not None else None,
'oncotree_code': str(arg_dict['oncokb_oncotree_code']) if arg_dict['oncokb_oncotree_code'] is not None else None,
'exclusive': int(arg_dict['oncokb_exclusive']),
'data_version': None,
'data_release_date': None,
'api_version': None
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Oncokb token persisted to yaml 🐞 Bug ⛨ Security

The OncoKB API token is stored in conf_options and ends up written to <sample>.conf.yaml,
leaking credentials into a shareable output artifact. The code even contains a commented-out
redaction block, but it is not executed.
Agent Prompt
### Issue description
The OncoKB API token is written to the output YAML config file, which can be shared/archived and exposes credentials.

### Issue Context
PCGR writes `<output_prefix>.conf.yaml` for downstream reporting. That file should not contain secrets.

### Fix Focus Areas
- pcgr/config.py[114-125]
- pcgr/main.py[990-993]
- pcgr/main.py[1019-1026]

### Suggested fix
Implement one of:
1. **Do not serialize the token**: before writing YAML, set `yaml_data['conf']['oncokb']['api_token'] = None` (or delete the key).
2. **Keep token only in-memory**: store token outside `conf_options`/`yaml_data` entirely and pass it only to the OncoKB execution step.

At minimum, uncomment/activate the existing redaction logic (but preferably redact *before* writing the file the first time).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants