Background
I'm the maintainer of atropos, which is being archived in favor of actively-maintained tools like fastp. Surfacing features that don't have a fastp equivalent in case any are interesting.
Proposal
Add a standalone fastp detect (or --detect-only) mode that scans the first N reads and reports candidate adapter/contaminant sequences without performing any trimming. Today fastp auto-detects adapters as a side effect of trimming; there's no way to run detection in isolation.
Why this is useful
When a user receives FASTQs with no library-prep metadata, the first question is "what adapter do I trim?". A detect-only mode lets them inspect the answer before committing to a trimming pass — especially helpful in automated pipelines that want to log detected adapters separately from the trimming step, or want to fail fast when detection is inconclusive.
Suggested algorithms (atropos offers all three; fastp could pick any subset)
- Known-contaminant scan — match against a bundled database of common adapter sequences.
- Heuristic — find the longest common suffix across a random read sample.
- khmer-style k-mer frequency — flag over-represented k-mers in the 3′ end.
fastp's existing evaluator.cpp already does passing-band overrepresentation analysis, so the infrastructure is partly there.
Prior art
Happy to help if you want to pursue this.
Background
I'm the maintainer of atropos, which is being archived in favor of actively-maintained tools like fastp. Surfacing features that don't have a fastp equivalent in case any are interesting.
Proposal
Add a standalone
fastp detect(or--detect-only) mode that scans the first N reads and reports candidate adapter/contaminant sequences without performing any trimming. Today fastp auto-detects adapters as a side effect of trimming; there's no way to run detection in isolation.Why this is useful
When a user receives FASTQs with no library-prep metadata, the first question is "what adapter do I trim?". A detect-only mode lets them inspect the answer before committing to a trimming pass — especially helpful in automated pipelines that want to log detected adapters separately from the trimming step, or want to fail fast when detection is inconclusive.
Suggested algorithms (atropos offers all three; fastp could pick any subset)
fastp's existing
evaluator.cppalready does passing-band overrepresentation analysis, so the infrastructure is partly there.Prior art
--info-fileis a weaker analog (reports adapter matches alongside trimming, not a dedicated detection mode).Happy to help if you want to pursue this.