fix: Resolve duplicate columns in Stunner DLS output by nathan-stender · Pull Request #1213 · Benchling-Open-Source/allotropy

nathan-stender · 2026-06-01T18:54:03Z

Summary

Stunner parser now recognizes alternate DLS column names used by newer instrument software (e.g., Z-Avg Dia. (nm) vs Z Ave. Dia (nm), Rayleigh ratio R (1/km) vs (cm^-1), Viscosity at 25°C vs 20°C)
Previously unmapped DLS fields (Number of Peaks, Number of Angles, Angles Measured, Intercept) are now structured as calculated data documents instead of leaking to custom info
number_of_averages and Number of acquisitions used now try both (total) and non-(total) column name variants
B/S/F/R (sample role type) added to sample custom info
Removed 52 N/A placeholder columns from data system document.custom information document that were causing duplicate columns in connector CSV output

Context

Follow-up to #1205. Customer (ProFound) reported 17 remaining duplicate columns after the PkN fix. Root cause was twofold:

CALCULATED_DATA_LOOKUP did not recognize newer column naming conventions, so DLS fields fell through get_unread() into measurement_custom_info
When the parser reads metadata from the first data row (no-header mode), unread DLS/peak columns leaked into data_system_document.custom_information_document as N/A placeholders — the CSV conversion lib then produced one column from the placeholder and one from the actual structured data

Test plan

All 34 Stunner/Lunatic tests pass (33 existing + 1 new)
All 12 Lunatic discovery tests pass
New anonymized test file (example03) exercises the new PkN column format with blank, no-peaks, single-peak, and multi-peak measurements
Verified customer file: 0/85 measurements have measurement_custom_info content
Verified data_system_document.custom_information_document is empty (was 52 N/A keys)
Lint clean (black, ruff, mypy)

🤖 Generated with Claude Code

The Stunner parser was not recognizing alternate column naming conventions used by newer instrument software, causing DLS fields to leak into measurement_custom_info while also appearing as structured (but empty) calculated data documents — producing duplicate columns in connector output. - Add alt_columns support to CALCULATED_DATA_LOOKUP for column name variants (e.g. "z-avg dia. (nm)" vs "z ave. dia (nm)", "rayleigh ratio r (1/km)" vs "rayleigh ratio r (cm^-1)", "viscosity at 25°c" vs "viscosity at 20°c") - Add number_of_peaks, number_of_angles, angles_measured, and intercept to CALCULATED_DATA_LOOKUP as structured DLS calculated data - Fix number_of_averages to also try "number of acquisitions (total)" - Fix device_control_custom_info to try "number of acquisitions used (total)" - Add sample_role_type (B/S/F/R) to sample_custom_info - Add anonymized test file exercising new column format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The header SeriesData (read from the first data row in no-header mode) was leaking all unread DLS and peak columns into data_system_document .custom_information_document as N/A placeholders. The CSV conversion lib then produces duplicate columns — one from the N/A metadata, one from the actual structured data. - Add skip patterns in create_metadata for DLS, peak, and measurement columns that are handled elsewhere in the ASM - Fix type annotation for CALCULATED_DATA_LOOKUP to accept alt_columns - Fix type narrowing in structure test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

### Fixed - Resolve duplicate columns in Stunner DLS output (#1213) - Correct peak height and area unit conversions in Chromeleon parser (#1212)

#1215) ## Summary - PkOI (Peak of Interest) columns were being silently discarded — consumed by peak extraction to prevent custom info leaking, but never written to output - Added `alt_columns` for new `pkoi` naming convention to existing `CALCULATED_DATA_LOOKUP` Peak of Interest entries - Added two new calculated data entries: Peak of Interest Mass Mean Diameter and Peak of Interest Rayleigh Ratio R (new fields in newer instrument software) - All 72 columns from the customer file are now accounted for in the ASM output ## Context Follow-up to #1213. The `CALCULATED_DATA_LOOKUP` already had "Peak of Interest" entries but they used old column names (`peak of interest mean dia (nm)`) that didn't match the new format (`pkoi intensity mean dia. (nm)`). Meanwhile, `_extract_peak_data` was consuming the PkOI columns to prevent leaking, effectively discarding the data entirely. ## Test plan - [x] All 34 Stunner/Lunatic tests pass - [x] Lint clean (ruff, black, mypy) - [x] Customer file produces 510 Peak of Interest calculated data docs across 6 fields - [x] All 72 original columns validated as present in ASM output - [x] 60/85 measurements have real PkOI values, 25 correctly report N/A (blanks/no-peaks) - [x] No duplicate columns in data system document 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

nathan-stender and others added 2 commits June 1, 2026 14:47

nathan-stender requested review from a team and slopez-b as code owners June 1, 2026 18:54

nathan-stender requested a review from stephenworlow June 1, 2026 18:54

stephenworlow approved these changes Jun 1, 2026

View reviewed changes

nathan-stender merged commit 3c5d33d into main Jun 1, 2026
9 checks passed

nathan-stender deleted the fix/stunner-dls-column-duplicates branch June 1, 2026 20:32

nathan-stender mentioned this pull request Jun 1, 2026

release: Update allotropy version to 0.1.131 #1214

Merged

nathan-stender added a commit that referenced this pull request Jun 1, 2026

release: Update allotropy version to 0.1.131 (#1214)

5f9d227

### Fixed - Resolve duplicate columns in Stunner DLS output (#1213) - Correct peak height and area unit conversions in Chromeleon parser (#1212)

nathan-stender mentioned this pull request Jun 1, 2026

fix: Report Peak of Interest data as calculated data in Stunner parser #1215

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Resolve duplicate columns in Stunner DLS output#1213

fix: Resolve duplicate columns in Stunner DLS output#1213
nathan-stender merged 2 commits into
mainfrom
fix/stunner-dls-column-duplicates

nathan-stender commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nathan-stender commented Jun 1, 2026

Summary

Context

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants