Published in Cloud

COVID-19 database was edited in Wuhan

by on25 June 2021

Old samples have re-emerged in Google cloud 

About a year ago, genetic sequences from more than 200 virus samples from early cases of Covid-19 in Wuhan disappeared from an online scientific database, however, some of them have apparently tipped up on the Google cloud.

A researcher in Seattle, Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Centre found the samples say they provide new information for discerning when and how the virus may have spilt over from a bat or another animal into humans.

According to Apple Fanzine, the New York Times, the new analysis, released on Tuesday, bolsters earlier suggestions that a variety of coronaviruses may have been circulating in Wuhan before the initial outbreaks linked to animal and seafood markets in December 2019. Bloom's study neither strengthens nor discounts the hypothesis that the pathogen leaked out of a famous Wuhan lab. But it does raise questions about why original sequences were deleted and suggest that there may be more revelations to recover from the far corners of the internet.

The genetic sequences of viral samples hold crucial clues about how SARS-CoV-2 shifted to our species from another animal, most likely a bat. Most precious of all are sequences from early in the pandemic because they take scientists closer to the original spillover event.

Bloom had been reviewing what genetic data had been published by various research groups, he came across a March 2020 study with a spreadsheet that included information on 241 genetic sequences collected by scientists at Wuhan University. The spreadsheet indicated that the scientists had uploaded the sequences to an online database called the Sequence Read Archive, managed by the U.S. government's National Library of Medicine.

When Bloom looked for the Wuhan sequences in the database earlier this month, his only result was "no item found". Puzzled, he went back to the spreadsheet for any further clues. It indicated that the 241 sequences had been collected by a scientist named Aisi Fu at Renmin Hospital in Wuhan.

Searching medical literature,  Bloom eventually found another study posted online in March 2020 by Dr. Fu and his chums, describing a new experimental test for SARS-CoV-2. The Chinese scientists published it in a scientific journal three months later. In that study, the scientists wrote that they had looked at 45 samples from nasal swabs taken "from outpatients with suspected COVID-19 early in the epidemic". They then searched for a portion of SARS-CoV-2's genetic material in the swabs. The researchers did not publish the actual sequences of the genes they fished out of the samples. Instead, they only published some mutations in the viruses.

A number of clues indicated to Bloom that the samples were the source of the 241 missing sequences. The papers included no explanation as to why the sequences had been uploaded to the Sequence Read Archive, only to disappear later.

Perusing the archive, Bloom figured out that many of the sequences were stored as files on Google Cloud. Each sequence was contained in a file in the cloud, and the names of the files all shared the same basic format, he reported. Dr. Bloom swapped in the code for a missing sequence from Wuhan. Suddenly, he had the sequence.

He managed to recover 13 sequences from the cloud this way. With this new data, Bloom looked back once more at the early stages of the pandemic. He combined the 13 sequences with other published sequences of early coronaviruses, hoping to make progress on building the family tree of SARS-CoV-2. Working out all the steps by which SARS-CoV-2 evolved from a bat virus has been a challenge because scientists still have a limited number of samples to study.

Some of the earliest samples come from the Huanan Seafood Wholesale Market in Wuhan, where an outbreak occurred in December 2019. But those market viruses actually have three extra mutations that are missing from SARS-CoV-2 samples collected weeks later.

The later viruses look more like coronaviruses found in bats, supporting the idea that there was some early lineage of the virus that did not pass through the seafood market. Bloom found that the deleted sequences he recovered from the cloud also lack those extra mutations. "They're three steps more similar to the bat coronaviruses than the viruses from the Huanan fish market", Bloom said.

Bloom thinks that by the time SARS-CoV-2 reached the market, it had been circulating for a while in Wuhan or beyond. The market viruses, he argued, aren't representative of the full diversity of coronaviruses already loose in late 2019.

Last modified on 25 June 2021
Rate this item
(6 votes)

Read more about: