Skip to content

fix: normalise Kleborate column names in import_kleborate() to prevent silent data loss#108

Closed
efosternyarko wants to merge 2 commits into
AMRverse:mainfrom
efosternyarko:fix/kleborate-column-case-normalisation
Closed

fix: normalise Kleborate column names in import_kleborate() to prevent silent data loss#108
efosternyarko wants to merge 2 commits into
AMRverse:mainfrom
efosternyarko:fix/kleborate-column-case-normalisation

Conversation

@efosternyarko
Copy link
Copy Markdown
Collaborator

@efosternyarko efosternyarko commented May 4, 2026

Problem

Different versions of Kleborate (and the same version depending on whether --strain is used) produce inconsistently capitalised column names. For example:

Column Kleborate v2/v3 (--strain) Kleborate v3 (default)
Aminoglycosides Agly_acquired AGly_acquired
Chromosomal beta-lactam Bla_Chr Bla_chr

Because import_kleborate() uses select(any_of(kleborate_class_table$Kleborate_Class)), columns whose names differ only in capitalisation are silently dropped — no warning, no error. Entire drug classes disappear from the returned genotype table.

Discovered while running the function on real Klebsiella pneumoniae data from The Gambia: Aminoglycosides showed 0 markers and 0% sensitivity, tracing back to Agly_acquired being silently skipped.

Fix

Add a case-insensitive rename step immediately after the sample_col rename (line 400) and before the select(any_of(...)) call. Any input column whose name matches an expected column name case-insensitively — but not exactly — is renamed to the expected capitalisation. An informative message() is emitted when this happens so users are aware of the discrepancy in their Kleborate output.

expected_cols_lower <- setNames(
  kleborate_class_table$Kleborate_Class,
  tolower(kleborate_class_table$Kleborate_Class)
)
cols_to_rename <- names(in_table)[
  tolower(names(in_table)) %in% names(expected_cols_lower) &
  !(names(in_table) %in% kleborate_class_table$Kleborate_Class)
]
if (length(cols_to_rename) > 0) {
  rename_vec <- setNames(cols_to_rename, expected_cols_lower[tolower(cols_to_rename)])
  in_table <- in_table %>% rename(!!!rename_vec)
  message(
    "Normalised Kleborate column name(s) to match expected capitalisation: ",
    paste(cols_to_rename, "->", names(rename_vec), collapse = ", ")
  )
}

Testing

Verified against:

  • Kleborate output with Agly_acquired / Bla_Chr (older capitalisation): aminoglycosides now correctly returned with 15 markers across 87 isolates
  • Kleborate output already using AGly_acquired / Bla_chr (expected capitalisation): no rename, no message, behaviour unchanged

Different Kleborate versions (and the same version with different
--strain vs default output) produce inconsistently capitalised column
names, e.g. 'Agly_acquired' vs 'AGly_acquired' and 'Bla_Chr' vs
'Bla_chr'. The downstream select(any_of(...)) call silently drops any
column whose name does not exactly match kleborate_classes$Kleborate_Class,
causing entire drug classes (e.g. aminoglycosides, beta-lactam chromosomal)
to be absent from the returned genotype table with no warning.

Add a case-insensitive rename step immediately after the sample-column
rename so that any column whose name differs only in capitalisation is
corrected before the any_of() selection. Emit an informative message()
when a rename occurs so users are aware of the mismatch.
R CMD check treats non-ASCII characters in source files as a WARNING,
which fails CI. Replace the UTF-8 right-arrow (U+2192) with ASCII '->'.
@efosternyarko
Copy link
Copy Markdown
Collaborator Author

Closing — the capitalisation mismatch (Agly_acquired / Bla_Chr) was in a manually prepared Excel file used in our analysis, not from Kleborate or Pathogenwatch output. The kleborate_classes lookup is correct. Will fix the column names in our data file locally. Sorry for the noise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant