fix: normalise Kleborate column names in import_kleborate() to prevent silent data loss by efosternyarko · Pull Request #108 · AMRverse/AMRgen

efosternyarko · 2026-05-04T20:28:36Z

Problem

Different versions of Kleborate (and the same version depending on whether --strain is used) produce inconsistently capitalised column names. For example:

Column	Kleborate v2/v3 (`--strain`)	Kleborate v3 (default)
Aminoglycosides	`Agly_acquired`	`AGly_acquired`
Chromosomal beta-lactam	`Bla_Chr`	`Bla_chr`

Because import_kleborate() uses select(any_of(kleborate_class_table$Kleborate_Class)), columns whose names differ only in capitalisation are silently dropped — no warning, no error. Entire drug classes disappear from the returned genotype table.

Discovered while running the function on real Klebsiella pneumoniae data from The Gambia: Aminoglycosides showed 0 markers and 0% sensitivity, tracing back to Agly_acquired being silently skipped.

Fix

Add a case-insensitive rename step immediately after the sample_col rename (line 400) and before the select(any_of(...)) call. Any input column whose name matches an expected column name case-insensitively — but not exactly — is renamed to the expected capitalisation. An informative message() is emitted when this happens so users are aware of the discrepancy in their Kleborate output.

expected_cols_lower <- setNames(
  kleborate_class_table$Kleborate_Class,
  tolower(kleborate_class_table$Kleborate_Class)
)
cols_to_rename <- names(in_table)[
  tolower(names(in_table)) %in% names(expected_cols_lower) &
  !(names(in_table) %in% kleborate_class_table$Kleborate_Class)
]
if (length(cols_to_rename) > 0) {
  rename_vec <- setNames(cols_to_rename, expected_cols_lower[tolower(cols_to_rename)])
  in_table <- in_table %>% rename(!!!rename_vec)
  message(
    "Normalised Kleborate column name(s) to match expected capitalisation: ",
    paste(cols_to_rename, "->", names(rename_vec), collapse = ", ")
  )
}

Testing

Verified against:

Kleborate output with Agly_acquired / Bla_Chr (older capitalisation): aminoglycosides now correctly returned with 15 markers across 87 isolates
Kleborate output already using AGly_acquired / Bla_chr (expected capitalisation): no rename, no message, behaviour unchanged

Different Kleborate versions (and the same version with different --strain vs default output) produce inconsistently capitalised column names, e.g. 'Agly_acquired' vs 'AGly_acquired' and 'Bla_Chr' vs 'Bla_chr'. The downstream select(any_of(...)) call silently drops any column whose name does not exactly match kleborate_classes$Kleborate_Class, causing entire drug classes (e.g. aminoglycosides, beta-lactam chromosomal) to be absent from the returned genotype table with no warning. Add a case-insensitive rename step immediately after the sample-column rename so that any column whose name differs only in capitalisation is corrected before the any_of() selection. Emit an informative message() when a rename occurs so users are aware of the mismatch.

R CMD check treats non-ASCII characters in source files as a WARNING, which fails CI. Replace the UTF-8 right-arrow (U+2192) with ASCII '->'.

efosternyarko · 2026-05-08T17:29:37Z

Closing — the capitalisation mismatch (Agly_acquired / Bla_Chr) was in a manually prepared Excel file used in our analysis, not from Kleborate or Pathogenwatch output. The kleborate_classes lookup is correct. Will fix the column names in our data file locally. Sorry for the noise!

efosternyarko added 2 commits May 4, 2026 21:25

fix: replace non-ASCII arrow with ASCII -> in message() call

d435c81

R CMD check treats non-ASCII characters in source files as a WARNING, which fails CI. Replace the UTF-8 right-arrow (U+2192) with ASCII '->'.

efosternyarko closed this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: normalise Kleborate column names in import_kleborate() to prevent silent data loss#108

fix: normalise Kleborate column names in import_kleborate() to prevent silent data loss#108
efosternyarko wants to merge 2 commits into
AMRverse:mainfrom
efosternyarko:fix/kleborate-column-case-normalisation

efosternyarko commented May 4, 2026 •

edited

Loading

Uh oh!

efosternyarko commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

efosternyarko commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Testing

Uh oh!

efosternyarko commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

efosternyarko commented May 4, 2026 •

edited

Loading