fix: Resolve duplicate columns in Stunner DLS output#1213
Merged
Conversation
The Stunner parser was not recognizing alternate column naming conventions used by newer instrument software, causing DLS fields to leak into measurement_custom_info while also appearing as structured (but empty) calculated data documents — producing duplicate columns in connector output. - Add alt_columns support to CALCULATED_DATA_LOOKUP for column name variants (e.g. "z-avg dia. (nm)" vs "z ave. dia (nm)", "rayleigh ratio r (1/km)" vs "rayleigh ratio r (cm^-1)", "viscosity at 25°c" vs "viscosity at 20°c") - Add number_of_peaks, number_of_angles, angles_measured, and intercept to CALCULATED_DATA_LOOKUP as structured DLS calculated data - Fix number_of_averages to also try "number of acquisitions (total)" - Fix device_control_custom_info to try "number of acquisitions used (total)" - Add sample_role_type (B/S/F/R) to sample_custom_info - Add anonymized test file exercising new column format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The header SeriesData (read from the first data row in no-header mode) was leaking all unread DLS and peak columns into data_system_document .custom_information_document as N/A placeholders. The CSV conversion lib then produces duplicate columns — one from the N/A metadata, one from the actual structured data. - Add skip patterns in create_metadata for DLS, peak, and measurement columns that are handled elsewhere in the ASM - Fix type annotation for CALCULATED_DATA_LOOKUP to accept alt_columns - Fix type narrowing in structure test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
stephenworlow
approved these changes
Jun 1, 2026
nathan-stender
added a commit
that referenced
this pull request
Jun 1, 2026
6 tasks
nathan-stender
added a commit
that referenced
this pull request
Jun 1, 2026
#1215) ## Summary - PkOI (Peak of Interest) columns were being silently discarded — consumed by peak extraction to prevent custom info leaking, but never written to output - Added `alt_columns` for new `pkoi` naming convention to existing `CALCULATED_DATA_LOOKUP` Peak of Interest entries - Added two new calculated data entries: Peak of Interest Mass Mean Diameter and Peak of Interest Rayleigh Ratio R (new fields in newer instrument software) - All 72 columns from the customer file are now accounted for in the ASM output ## Context Follow-up to #1213. The `CALCULATED_DATA_LOOKUP` already had "Peak of Interest" entries but they used old column names (`peak of interest mean dia (nm)`) that didn't match the new format (`pkoi intensity mean dia. (nm)`). Meanwhile, `_extract_peak_data` was consuming the PkOI columns to prevent leaking, effectively discarding the data entirely. ## Test plan - [x] All 34 Stunner/Lunatic tests pass - [x] Lint clean (ruff, black, mypy) - [x] Customer file produces 510 Peak of Interest calculated data docs across 6 fields - [x] All 72 original columns validated as present in ASM output - [x] 60/85 measurements have real PkOI values, 25 correctly report N/A (blanks/no-peaks) - [x] No duplicate columns in data system document 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Z-Avg Dia. (nm)vsZ Ave. Dia (nm),Rayleigh ratio R (1/km)vs(cm^-1),Viscosity at 25°Cvs20°C)Number of Peaks,Number of Angles,Angles Measured,Intercept) are now structured as calculated data documents instead of leaking to custom infonumber_of_averagesandNumber of acquisitions usednow try both(total)and non-(total)column name variantsB/S/F/R(sample role type) added to sample custom infodata system document.custom information documentthat were causing duplicate columns in connector CSV outputContext
Follow-up to #1205. Customer (ProFound) reported 17 remaining duplicate columns after the PkN fix. Root cause was twofold:
CALCULATED_DATA_LOOKUPdid not recognize newer column naming conventions, so DLS fields fell throughget_unread()intomeasurement_custom_infodata_system_document.custom_information_documentas N/A placeholders — the CSV conversion lib then produced one column from the placeholder and one from the actual structured dataTest plan
measurement_custom_infocontentdata_system_document.custom_information_documentis empty (was 52 N/A keys)🤖 Generated with Claude Code