Switch to using pathoplexus for nextclade#323
Switch to using pathoplexus for nextclade#323corneliusroemer wants to merge 20 commits intomasterfrom
Conversation
Add PPX accessions in excludes/includes via script
| sequences_url="https://lapis.pathoplexus.org/mpox/sample/unalignedNucleotideSequences?downloadAsFile=true&downloadFileBasename=mpox_nuc_2025-03-19T1422&versionStatus=LATEST_VERSION&isRevocation=false&dataFormat=fasta&compression=zstd", | ||
| metadata_url="https://lapis.pathoplexus.org/mpox/sample/details?downloadAsFile=true&downloadFileBasename=mpox_metadata_2025-03-19T1422&versionStatus=LATEST_VERSION&isRevocation=false&dataFormat=tsv&compression=zstd", |
There was a problem hiding this comment.
[question, not review]
Thinking about how we would use ppx in mpox for canonical ingest, and assuming that all NCBI data is in ppx (?), we would be dropping the fetch_from_ncbi.smk code and fetching TSV & FASTA from an API call similar to these lines. We'd then (?) convert these to a data/ppx.ndjson structure and curate the data as normal. Is this about right? Is this something we should be doing?
There was a problem hiding this comment.
[also question, but also kinda review]
How often is this param expected to change? downloadFileBasename=mpox_metadata_2025-03-19T1422 And is that param always expected to be the same between sequences_url and metadata_url? I would maybe be inclined to pull that out into a single param and then interpolate into the URL unless it is exceptionally stable…
There was a problem hiding this comment.
I think https://github.com/nextstrain/rsv/pull/87/files has answered my question here
| sequences_url="https://lapis.pathoplexus.org/mpox/sample/unalignedNucleotideSequences?downloadAsFile=true&downloadFileBasename=mpox_nuc_2025-03-19T1422&versionStatus=LATEST_VERSION&isRevocation=false&dataFormat=fasta&compression=zstd", | ||
| metadata_url="https://lapis.pathoplexus.org/mpox/sample/details?downloadAsFile=true&downloadFileBasename=mpox_metadata_2025-03-19T1422&versionStatus=LATEST_VERSION&isRevocation=false&dataFormat=tsv&compression=zstd", |
There was a problem hiding this comment.
[also question, but also kinda review]
How often is this param expected to change? downloadFileBasename=mpox_metadata_2025-03-19T1422 And is that param always expected to be the same between sequences_url and metadata_url? I would maybe be inclined to pull that out into a single param and then interpolate into the URL unless it is exceptionally stable…
| rule join_metadata: | ||
| input: | ||
| metadata="data/metadata.tsv", | ||
| stats=rules.filter_nextclade_results.output.stats, |
There was a problem hiding this comment.
| metadata="results/decent_metadata_raw.tsv", | ||
| output: | ||
| metadata="results/decent_metadata.tsv", | ||
| run: |
There was a problem hiding this comment.
| # --output {output.tree} | ||
| # """ | ||
| """ | ||
| ~/code/pree/rust/target/release/rust_parsimony build \ |
There was a problem hiding this comment.
just noting this as a merge blocker.
| - dnachun | ||
| dependencies: | ||
| - augur | ||
| - augur <= 27 # Augur 27-29.0 have issues with ancestral reconstruction |
There was a problem hiding this comment.
are there specific bug reports to link to here?
Description of proposed changes
Related issue(s)
Checklist