Giter Site home page Giter Site logo

sars-like-cov's Introduction

This is a one-off Nextstrain build for SARS-like coronaviruses, visible at nextstrain.org/groups/blab/sars-like-cov.

Data

The SARS-CoV-2 coronavirus genomes were generously shared by scientists at the Shanghai Public Health Clinical Center & School of Public Health, Fudan University (WH-Human_1), at the National Institute for Viral Disease Control and Prevention, China CDC, Beijing, China (Wuhan/IVDC-HB-01/2019, Wuhan/IVDC-HB-05/2019, IVDC-HB-04/2020) at the Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China (Wuhan/IPBCAMS-WH-01/2019), and at the Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, China (Wuhan/WIV04/2019). Related SARS-like bat virus bat/Yunnan/RaTG13/2013 was shared by scientists at the Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, China (Zhu et al) and related SARS-like pangolin viruses were shared by scientists at the State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China and at the State Key Laboratory of Emerging Infectious Diseases and Centre of Influenza Research, University of Hong Kong, Hong Kong (Lam, Cao et al). We gratefully acknowledge the Authors, Originating and Submitting laboratories of the genetic sequence and metadata made available through GISAID on which this research is based.

Background data for this build was sourced from Genbank via VIPR. Here, we downloaded all the SARS-like coronaviruses that were more than 5000 bases in length. These sequences are available in the repo at data/sequences.fasta.

The following viruses are not included as part of this repo as they are protected by the terms of GISAID sharing. Here, these genomes will need to be supplemented by the user. Please add these as additional strains in data/sequences.fasta. Metadata for these viruses already exists in data/metadata.tsv.

  • bat/Yunnan/RaTG13/2013
  • bat/Yunnan/RmYN01/2019
  • bat/Yunnan/RmYN02/2019
  • pangolin/Guangdong/1/2020
  • pangolin/Guangdong/P2S/2019
  • pangolin/Guangxi/P5E/2017
  • pangolin/Guangxi/P4L/2017
  • pangolin/Guangxi/P5L/2017
  • pangolin/Guangxi/P1E/2017

Building

After updating the data/sequences.fasta file, the entire build can be regenerated by running

snakemake -p

with a local Nextstrain installation or by running

nextstrain build .

with a containerized Nextstrain installation.

The resulting output JSON at auspice/sars-like-cov.json can be visualized by running auspice view --datasetDir auspice or nextstrain view auspice/ depending on local vs containerized installation.

sars-like-cov's People

Contributors

emmahodcroft avatar trvrb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sars-like-cov's Issues

there are several question about the data and logic this project

First, thank you for your providing of the metadate and code.
Afther build the environment of the nextstrain, I run this code succesfully, and there several question want to ask for help.

  1. the computing of divergence with 'mutation_length' and 'branch_length', what' s the relation between them? and I am confused about how to get the mutation_length and branch_length?
  2. why there are so many name :"NODE_00000XX" in the output josn even thoungh they have the div? and thus , the num of node in the end is much less than the data in metadata?

Add strains from Eliot et al.

We should update this build to include the following GenBank accessions from Eliot et al.:

  • MZ937000 (BANAL-52)
  • MZ937001 (BANAL-103)
  • MZ937002 (BANAL-116)
  • MZ937003 (BANAL-236)
  • MZ937004 (BANAL-247)

These data don't appear to be publicly posted on GenBank yet, however.

Add strains from Starr et al.

I don't know if this is the most appropriate place to document this request (or maybe the betacoronavirus build is better?), but Starr et al. describes the evolution of Sarbecoviruses including the ACE2 binding capability of a virus found in Africa. This virus, BtKY72, is one of two viruses in Tyler's tree that originate from Europe or Africa. BtkY72 appears in the beta-cov build mentioned above but it doesn't appear in this build. The other most closely related strain, BM48-31/BGR/2008, is from Europe and does not appear in either this build or the beta-cov build.

Maybe a better solution is to encourage/help Tyler host his own community build for his existing tree, but alternately, we could add the following strains to this build:

A few collection dates for the SARS epidemic

Hi I found two websites with some collection dates for the SARS epidemic
http://covdb.popgenetics.net/v2/index/searchlist https://era.ed.ac.uk/bitstream/handle/1842/5859/Raghwani2012.pdf?sequence=3&isAllowed=y

GZ02 2003-02-11 GD01 2003-04-13 TW1 2003-03-30 SZ3 2003-05-15 SZ16 2003-05-15 CUHK-AG01 2003-03-15 GZ0401 2003-12-22 HC/SZ/79/03 2003-12-29 HC/GZ/81/03 2003-12-31 HC/GZ/32/03 2003-12-12 HC/SZ/61/03 2003-12-20 civet020 2004-01-02 PC4-227 2004-01-05 CFB/SZ/94/03 2003-12-29 PC4-13 2004-01-10 PC4-136 2004-01-05 GZ0402 2004-01-05 civet010 2004-01-02 B039 2004-01-05 civet007 2004-01-02 A022 2004-01-05 CUHK-W1 2003-03-15 GZ50 2003-02-20 CUHK-Su10 2003-03-05 Tor2 2003-04-13 Sin2679 2003-03-15 BJ01 2003-03-07 CUHK-AG02 2003-03-23 HKU-39849-recSARS-CoV-HKU 2003-04-16 Frankfurt-1 2003-03-15 Sin3725V 2003-04-15 Sin3765V 2003-04-15 ZJ0301-from-China 2003-04-21 BJ02 2003-04-21 Sin845 2003-04-07 BJ03 2003-06-03 WHU 2003-05-03 ZJ01 2003-05-12 GD69 2003-08-03 GZ-B 2003-03-15 GZ-A 2003-02-22 LC2 2003-03-15 HC/SZ/DM1/03 2003-10-22

You can see there the timetree I obtained after removing a few outlier sequences http://babarlelephant.free-hoster.net/dist/index.html?sarscov1.json

broken link in footer

The link in the description is to /sars-like-cov?clade=Wuhan which (i'm guessing) works locally but doesn't work on nextstrain.org. It should include the groups prefix, like so: /groups/blab/sars-like-cov?clade=Wuhan

Additionally, I think the filter names have changed from a previous build so I believe the desired link is /groups/blab/sars-like-cov?f_virus_type=Wuhan_coronavirus

Just curious

What does RaTG13 in BetaCoV/bat/Yunnan/RaTG13/2013 stands for? I think Ra stands for Rhinolophus affinis, but what is TG13 then?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.