DIAMOND Reference Database Generator

Small repo to generate and store DIAMOND formatted (.dmnd file extension) reference databases for diamond blastx with customizable specificity.

Creating the database

First activate your environment using

micromamba create -f environment.yaml
micromamba activate diamond-db-creator

Then create a config.yaml file (like the example) with the list of all INSDC references you would like to include in your database. For example:

references:
  seg4-H1: U08903.1
  seg4-H2: CY005413.1
  seg3--H5N1: NC_007359.1

Note that the key e.g. seg4-H1 will be the prefix of the dataset name that is returned in the diamond blastx results.tsv. (As DIAMOND matches proteins each CDS in the sequence receives its own identifier, in Loculus we map all sequences that match the protein id|CDS{i} to the sequence id. ). The keys should be the same as your sequenceName they are assigned to e.g. {segment}-{reference} or alternatively, you can add lists of accepted matches to the config.accepted_dataset_matches field.

seqName	dataset	pident	...
MW874350.1	seg3-H5N1|CDS1	0.4829120176662018	...
MW874350.1	seg3-H1N1|CDS2	0.34937439846005774	...
MW874350.1	seg3-H3N2|CDS1	0.22552301255230126	...

snakemake --config config_file=<path-to-config-file>

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
profiles/default		profiles/default
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DIAMOND Reference Database Generator

Creating the database

About

Uh oh!

Releases

Packages

Languages

loculus-project/diamond-reference-databases

Folders and files

Latest commit

History

Repository files navigation

DIAMOND Reference Database Generator

Creating the database

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages