Skip to content

fix: Resolve duplicate columns in Stunner DLS output#1213

Merged
nathan-stender merged 2 commits into
mainfrom
fix/stunner-dls-column-duplicates
Jun 1, 2026
Merged

fix: Resolve duplicate columns in Stunner DLS output#1213
nathan-stender merged 2 commits into
mainfrom
fix/stunner-dls-column-duplicates

Conversation

@nathan-stender
Copy link
Copy Markdown
Collaborator

Summary

  • Stunner parser now recognizes alternate DLS column names used by newer instrument software (e.g., Z-Avg Dia. (nm) vs Z Ave. Dia (nm), Rayleigh ratio R (1/km) vs (cm^-1), Viscosity at 25°C vs 20°C)
  • Previously unmapped DLS fields (Number of Peaks, Number of Angles, Angles Measured, Intercept) are now structured as calculated data documents instead of leaking to custom info
  • number_of_averages and Number of acquisitions used now try both (total) and non-(total) column name variants
  • B/S/F/R (sample role type) added to sample custom info
  • Removed 52 N/A placeholder columns from data system document.custom information document that were causing duplicate columns in connector CSV output

Context

Follow-up to #1205. Customer (ProFound) reported 17 remaining duplicate columns after the PkN fix. Root cause was twofold:

  1. CALCULATED_DATA_LOOKUP did not recognize newer column naming conventions, so DLS fields fell through get_unread() into measurement_custom_info
  2. When the parser reads metadata from the first data row (no-header mode), unread DLS/peak columns leaked into data_system_document.custom_information_document as N/A placeholders — the CSV conversion lib then produced one column from the placeholder and one from the actual structured data

Test plan

  • All 34 Stunner/Lunatic tests pass (33 existing + 1 new)
  • All 12 Lunatic discovery tests pass
  • New anonymized test file (example03) exercises the new PkN column format with blank, no-peaks, single-peak, and multi-peak measurements
  • Verified customer file: 0/85 measurements have measurement_custom_info content
  • Verified data_system_document.custom_information_document is empty (was 52 N/A keys)
  • Lint clean (black, ruff, mypy)

🤖 Generated with Claude Code

nathan-stender and others added 2 commits June 1, 2026 14:47
The Stunner parser was not recognizing alternate column naming conventions
used by newer instrument software, causing DLS fields to leak into
measurement_custom_info while also appearing as structured (but empty)
calculated data documents — producing duplicate columns in connector output.

- Add alt_columns support to CALCULATED_DATA_LOOKUP for column name variants
  (e.g. "z-avg dia. (nm)" vs "z ave. dia (nm)", "rayleigh ratio r (1/km)"
  vs "rayleigh ratio r (cm^-1)", "viscosity at 25°c" vs "viscosity at 20°c")
- Add number_of_peaks, number_of_angles, angles_measured, and intercept to
  CALCULATED_DATA_LOOKUP as structured DLS calculated data
- Fix number_of_averages to also try "number of acquisitions (total)"
- Fix device_control_custom_info to try "number of acquisitions used (total)"
- Add sample_role_type (B/S/F/R) to sample_custom_info
- Add anonymized test file exercising new column format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The header SeriesData (read from the first data row in no-header mode)
was leaking all unread DLS and peak columns into data_system_document
.custom_information_document as N/A placeholders. The CSV conversion lib
then produces duplicate columns — one from the N/A metadata, one from
the actual structured data.

- Add skip patterns in create_metadata for DLS, peak, and measurement
  columns that are handled elsewhere in the ASM
- Fix type annotation for CALCULATED_DATA_LOOKUP to accept alt_columns
- Fix type narrowing in structure test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nathan-stender nathan-stender requested review from a team and slopez-b as code owners June 1, 2026 18:54
@nathan-stender nathan-stender merged commit 3c5d33d into main Jun 1, 2026
9 checks passed
@nathan-stender nathan-stender deleted the fix/stunner-dls-column-duplicates branch June 1, 2026 20:32
nathan-stender added a commit that referenced this pull request Jun 1, 2026
### Fixed

- Resolve duplicate columns in Stunner DLS output (#1213)
- Correct peak height and area unit conversions in Chromeleon parser
(#1212)
nathan-stender added a commit that referenced this pull request Jun 1, 2026
#1215)

## Summary
- PkOI (Peak of Interest) columns were being silently discarded —
consumed by peak extraction to prevent custom info leaking, but never
written to output
- Added `alt_columns` for new `pkoi` naming convention to existing
`CALCULATED_DATA_LOOKUP` Peak of Interest entries
- Added two new calculated data entries: Peak of Interest Mass Mean
Diameter and Peak of Interest Rayleigh Ratio R (new fields in newer
instrument software)
- All 72 columns from the customer file are now accounted for in the ASM
output

## Context
Follow-up to #1213. The `CALCULATED_DATA_LOOKUP` already had "Peak of
Interest" entries but they used old column names (`peak of interest mean
dia (nm)`) that didn't match the new format (`pkoi intensity mean dia.
(nm)`). Meanwhile, `_extract_peak_data` was consuming the PkOI columns
to prevent leaking, effectively discarding the data entirely.

## Test plan
- [x] All 34 Stunner/Lunatic tests pass
- [x] Lint clean (ruff, black, mypy)
- [x] Customer file produces 510 Peak of Interest calculated data docs
across 6 fields
- [x] All 72 original columns validated as present in ASM output
- [x] 60/85 measurements have real PkOI values, 25 correctly report N/A
(blanks/no-peaks)
- [x] No duplicate columns in data system document

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants