Download a SARS-CoV-2 genome data package
Download sequences for SARS-CoV-2 GenBank genomes by taxon or lineage
Download a SARS-CoV-2 genome data package
Download a SARS-CoV-2 GenBank genome data package by taxon name or accession. The default data package includes genome sequence and primary metadata. Options are available to include cds and protein fasta sequence, and annotation and biosample metadata. Refer to the datasets command-line (CLI) reference for all available flags and subcommands.
If you want to download a virus data package for all SARS-CoV-2 genomes we recommend using the datasets CLI to request a cached virus data package. These packages are highly compressed and allow for a faster more reliable download experience. Cached packages are only available for all SARS-CoV-2 GenBank genomes and the following filtered sets:
- All SARS-CoV-2 genomes.
- Human host only
- Human host only & complete
- Complete only
- Annotated only
Download a cached virus data package of all SARS-CoV-2 genomes by taxon
You can use the organism name or NCBI Taxonomy ID (2697049).
datasets download virus genome taxon SARS-CoV-2 --filename sars_cov_2.zip
Download a cached virus data package of all SARS-CoV-2 complete genomes by taxon
datasets download virus genome taxon SARS-CoV-2 --complete-only --filename sars_cov_2_complete.zip
Download a custom set of SARS-CoV-2 genomes by accession(s)
For multiple accessions, list them on the CLI, separated by spaces. Alternatively, use the flag --inputfile
, and provide a text file with one accession per line.
datasets download virus genome accession NC_045512.2
Download by SARS-CoV-2 lineage
Download SARS-CoV-2 GenBank genomes for specific lineages as classified by pangolin
datasets download virus genome taxon SARS-CoV-2 --lineage P.1 --filename SARS-CoV-2-P.1.zip