refactor(dada): dedup ASV serialization across dada handlers (#13)#31
Merged
Conversation
…pseudo (#13) Hoist the per-handler-local `AsvEntry` and `DadaStats` structs to module level and extract the duplicated cluster→ASV conversion and JSON serialization into shared helpers: - `birth_type_str()` — the 4-arm BirthType match (was duplicated 3x) - `asv_entry_from_cluster()` — decode + birth_type + field copy; takes `abundance` explicitly so the pooled path passes its recomputed per-sample read count (was duplicated 3x) - `to_json()` — compact-vs-pretty serialization (was duplicated 3x) No behavioral change: output JSON schema and values are byte-identical. Net -67 lines. The single-input `dada` handler keeps its own `DadaOutput` since it carries the aux-only fields. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses #13. During
dada-pooledcodegen, redundant code was flagged across thedadahandlers;dada-pseudohas since been added and is now folded into scope.This is a behavior-preserving dedup pass. Output JSON schema and values are byte-identical — verified by the existing
dada_from_fastq_matches_dada_from_derep_json,dada_multi_input_matches_per_file_runs, and pooled/pseudo determinism + downstream-feed tests.What changed
Hoisted the per-handler-local
AsvEntry/DadaStatsstructs to module level and extracted three duplicated blocks (each appeared 3×) into shared helpers:birth_type_str(&BirthType)— the 4-armBirthTypematchasv_entry_from_cluster(cluster, abundance)— decode + birth_type + field copy;abundanceis explicit so the pooled per-sample path passes its recomputed read countto_json(value, compact)— compact-vs-pretty serializationNet −67 lines (
62 insertions, 129 deletions).The single-input
dadahandler keeps its ownDadaOutputbecause it carries aux-only fields (ClusterStatJson/BirthSubJson/AuxJson); fully hoisting that would pull in the aux types for marginal gain.Deliberately out of scope
Lower-value / higher-churn items left for a possible follow-up: rayon pool-init boilerplate, the
[dada]/[dada-pooled]/[dada-pseudo]log tags, prior-FASTA loading, and sample-name resolution.Verification
cargo buildcleancargo clippyclean🤖 Generated with Claude Code