Skip to content

GenSpectrum/nextclade-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 

Repository files navigation

nextclade-datasets

This directory is a Genspectrum-maintained nextclade server, created using the docs: https://github.com/nextstrain/nextclade_data/blob/master/docs/dataset-server-maintenance.md.

You can run the server locally for testing by pasting

https://clades.nextstrain.org/?dataset-server=https://raw.githubusercontent.com/genspectrum/nextclade-datasets/main/data

into an incognito browser.

How to add new datasets?

  1. Create a dataset following nextclade's instructions.
  2. Update the index.json: this should include the details from each pathogen.json folder, additionally the index.json expects datasets to be versioned. For simplicity set version to unreleased and keep each dataset in a subdirectory called unreleased.

To move nextclade datasets into a subfolder called unreleased you can run:

DATASET=S-2to5
DEST="$DATASET/unreleased"
mkdir -p "$DEST"
rsync -a --remove-source-files --exclude 'unreleased/' $DATASET/ "$DEST"/
  1. Zip the contents of the dataset into dataset.zip - this is what will be downloaded by nextclade and unzipped prior to use.
cd $DATASET/unreleased
zip -r dataset.zip *

Note that steps 2 and 3 are performed automatically by the CI when you create an official nextclade dataset, using the rebuild script.

Download H5N1 datasets as follows:

for i in {1..8}; do
    nextclade_dataset_name=flu/h5n1/seg$i
    nextclade_dataset_server=https://raw.githubusercontent.com/genspectrum/nextclade-datasets/main/data
    nextclade3 dataset get --name $nextclade_dataset_name --server $nextclade_dataset_server --output-dir output$i
done

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •