Skip to content

Feature request: standalone adapter-detection subcommand #690

@jdidion

Description

@jdidion

Background

I'm the maintainer of atropos, which is being archived in favor of actively-maintained tools like fastp. Surfacing features that don't have a fastp equivalent in case any are interesting.

Proposal

Add a standalone fastp detect (or --detect-only) mode that scans the first N reads and reports candidate adapter/contaminant sequences without performing any trimming. Today fastp auto-detects adapters as a side effect of trimming; there's no way to run detection in isolation.

Why this is useful

When a user receives FASTQs with no library-prep metadata, the first question is "what adapter do I trim?". A detect-only mode lets them inspect the answer before committing to a trimming pass — especially helpful in automated pipelines that want to log detected adapters separately from the trimming step, or want to fail fast when detection is inconclusive.

Suggested algorithms (atropos offers all three; fastp could pick any subset)

  1. Known-contaminant scan — match against a bundled database of common adapter sequences.
  2. Heuristic — find the longest common suffix across a random read sample.
  3. khmer-style k-mer frequency — flag over-represented k-mers in the 3′ end.

fastp's existing evaluator.cpp already does passing-band overrepresentation analysis, so the infrastructure is partly there.

Prior art

Happy to help if you want to pursue this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions