Sequences deleted from SARS-CoV-2 at start of Wuhan outbreak offer clues
Coronaviruses collect changes in their genetic sequences as they replicate, and studying these changes in the sequences collected over time helps scientists trace the history of the virus.
Bloom therefore examined the genomic sequence reports of the virus from people infected early on to reveal patterns of its evolution. He didn’t find much at first.
Then he found an article referring to a sequence data set he hadn’t seen mentioned anywhere else. However, when he searched for these sequences in the most likely online data archives, he did not find them.
He knew that researchers can request the removal of footage they uploaded to the archives. Realizing that the data can be saved online, it deduces the corresponding URLs and finds files related to the sequences still present on the Google Cloud.
âI was able to determine that the deleted data corresponded to a study that partially sequenced 45 nasopharyngeal samples of [Wuhan] ambulatory patients suspected of COVID-19 at the start of the epidemic, âhe said. tweeted.
Combined with other clues, he eventually found 241 data files that had been downloaded and then deleted from the database. Taken together, these files represented portions of 34 early SARS-CoV-2 samples that were not previously known. But each file only included a portion of the complete sequencing information for each sample.
In the end, Bloom reconstructed enough data to examine the partial sequences of 13 early cases of SARS-CoV-2.
What the footage shows about the start of the Wuhan epidemic
The 13 reconstructed sequences do not transform what is known about the early stages of the Wuhan outbreak, and information is lacking on when and where the samples were collected. Still, they help fill in some details that bring us closer to identifying the original overflow event.
First, the data adds to other evidence that the seafood market in Wuhan was not where the virus passed from animals to humans.
Nature News wrote: âThe first virus streaks from Wuhan come from individuals linked to the city’s Huanan seafood market in December 2019, which was initially believed to be where the coronavirus first passed from animals to the city. humans. But the seafood market sequences are further removed from the closest relatives of SARS-CoV-2 in bats – the most likely ultimate origin of the virus – than subsequent sequences, including one collected in the United States. . “
Columbia University epidemiologist Dr W. Ian Lipkin told the Washington post that Bloom’s document offers “evidence of what many of us speculated – that the virus was circulating before the market outbreak.” The retraction of sequence data is unprecedented and needs to be corrected. “
Lipkin told USA Today that “this line of inquiry can help us determine the origin of the virus and reconstruct how it spread in the early days of the pandemic.”
Dr Sudhir Kumar, an evolutionary geneticist at Temple University, told Nature News: âIt seemed to me that the Wuhan market was one of the first super-broadcast events.
Kumar added that the footage “suggests that SARS-CoV-2 developed great diversity in the early stages of the pandemic in China, including Wuhan.”
Scientists need to find more of these missing pieces from the first outbreak to draw conclusions about the origins of the virus.
âMaybe our picture of what was present at the start of Wuhan from what was sequenced could be somewhat skewed,â Bloom told the New York Times.
No direct evidence for either theory of origin
Bloom is one of 17 experts who wrote a letter published May 13 in Science calling for an investigation into the onset of the pandemic, with a more balanced view considering all possibilities, including transmission from animals to humans – which occurs produced in many new infectious diseases – such as as well as a laboratory accident.
These new data don’t tip the scales in one theory or another, he said.
âThese data provide no direct evidence for a lab accident or natural zoonosis,â Bloom said via email, with further explanation. in a Twitter thread. “However, they do indicate the importance of continuing to search for new data on the origins and early spread of SARS-CoV-2.”
He says Science that it is vital for scientists to put aside prejudices about the origins of the virus and to study this question with transparency:
âSo many people have agendas and preconceptions on this topic that if you open your mouth on the topic, someone is going to take what you said to support or reject a particular narrative. So the choices are either to say nothing at all, which I don’t think is helpful or productive, or to just try to draw the possible conclusions and make them as transparent as possible. No matter how much people love [my new study] or whether they don’t like it, whether they agree with the interpretation or disagree with the interpretation, they can at least go download it and rehearse it themselves.
Reasons for deletion
In a media statement, the National Institutes of Health – which operates the archive that once housed the sequence data – explained the process of removing the footage at the request of the scientist who submitted it.
“The requester indicated that the sequence information had been updated, was submitted to another database and wanted the data to be deleted from SRA (Sequence Read Archive) to avoid version control issues,” said the NIH in its press release, reported by USA Today. âSubmitting investigators own the rights to their data and can request that the data be removed. ”
These reasons were cited in a NIH email sent to Bloom, which he included in his updated prepublication. However, Bloom noted that he had found no indication that the footage had in fact been uploaded to another database, as the authors claimed.
Newly discovered, but not new
Nature News reported:
Stephen Goldstein, a virologist at the University of Utah in Salt Lake City, points out that the sequences Bloom picked up were not hidden: they are described in detail, with enough information about the sequences to know their evolutionary relationship. with other early SARS-CoV-2. sequences, in the Small paper. âI don’t think this preprint tells us much new, but it brings to the fore the sequence data that has been publicly available, albeit under the radar,â Goldstein said.
Bloom says it doesn’t matter that the data is not new; rather, the fact is that people analyzing other SARS-CoV-2 sequences could not find them.
âIn the revised manuscript, Iâ¦ specify that I cannot determine the motivations of the authors. However, I note that I did not find any websites with updated data, and the practical consequence of deleting was that no one was aware of the existence of the data, âBloom tweeted.
Bloom said BBC Science in action, “I hope that if this article contributes to that discussion, it reminds scientists that we perform better if we search for data and try to analyze data, and we perform less well if we yell at each other about different positions. with very little evidence.
To that end, he is researching other sequences of the start of the pandemic, and he hopes other scientists will join the effort.
the The Wall Street Journal reported that other scientists share his interest.
“This makes us wonder if there are other sequences like this that have been purged,” said Dr. Vaughn S. Cooper, evolutionary biologist at the University of Pittsburgh.
Bloom posted the data he found online to encourage others to do their own analysis.
“We really need to look carefully and see if there is any other early information about the footage that was not found,” he told the the Wall Street newspaper. “I intend to go through all the first preprints I can find on SARS-CoV-2 and see if they describe data that is not in the databases.”
He keeps an open mind about how what is found can change our understanding of the pandemic, just like other scientists.
“We should, however, be prepared to revise these ideas and hypotheses further if and when earlier sequence data emerges,” Dr. Sergei Pond, professor of biology at Temple University. tweeted, calling Bloom’s pre-publication “an important piece of forensic bioinformatics.” He added: “I wouldn’t be surprised if these revisions are very significant (eg timing of introduction).”
Bloom argues that while understanding the origins and early spread is a scientific matter, policymakers must also help by enabling better investigations and greater transparency in the search and analysis of all possible data.
âWe need to understand how SARS-CoV-2 started, because the response will have implications for mitigating pandemics in the future,â Bloom said. âThe question of the origins of COVID-19 does not go away. It’s important that scientists look at the problem to make sure we’ve done everything we can to explore this. ”