Skip to content

Small repo to generate and store DIAMOND formatted (.dmnd file extension) reference databases for diamond blastx sort with customizable specificity

Notifications You must be signed in to change notification settings

loculus-project/diamond-reference-databases

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DIAMOND Reference Database Generator

Small repo to generate and store DIAMOND formatted (.dmnd file extension) reference databases for diamond blastx with customizable specificity.

Creating the database

First activate your environment using

micromamba create -f environment.yaml
micromamba activate diamond-db-creator

Then create a config.yaml file (like the example) with the list of all INSDC references you would like to include in your database. For example:

references:
  seg4-H1: U08903.1
  seg4-H2: CY005413.1
  seg3--H5N1: NC_007359.1

Note that the key e.g. seg4-H1 will be the prefix of the dataset name that is returned in the diamond blastx results.tsv. (As DIAMOND matches proteins each CDS in the sequence receives its own identifier, in Loculus we map all sequences that match the protein id|CDS{i} to the sequence id. ). The keys should be the same as your sequenceName they are assigned to e.g. {segment}-{reference} or alternatively, you can add lists of accepted matches to the config.accepted_dataset_matches field.

seqName	dataset	pident	...
MW874350.1	seg3-H5N1|CDS1	0.4829120176662018	...
MW874350.1	seg3-H1N1|CDS2	0.34937439846005774	...
MW874350.1	seg3-H3N2|CDS1	0.22552301255230126	...
snakemake --config config_file=<path-to-config-file>

About

Small repo to generate and store DIAMOND formatted (.dmnd file extension) reference databases for diamond blastx sort with customizable specificity

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages