No single research institute or country can stop the COVID-19 pandemic on its own, so scientists have been finding new ways to collaborate with each other and with public health authorities and governments around the world.
How does sharing data help?
At the beginning of the outbreak, scientists knew little or nothing about the new coronavirus, now called SARS-CoV-2. To start making sense of this virus, researchers around the world began to gather, analyse and share a wide range of data types (see Table).
|Data type||What does it allow researchers to do?|
|Genomic data||Understand the origins of the virus and monitor the spread of infection|
|Protein structure and function data||Understand how the virus interacts with other molecules, such as those used by the virus to enter human lung cells|
|Microscopy data||Understand how the virus damages human cells and causes disease|
|Chemical compound and drug target data||Discover potential treatments|
|Clinical data||Understand how and why people become ill, and why they recover or die from COVID-19|
|Epidemiological data||Monitor and understand the spread of infection through populations|
|Scientific literature||See what studies have already been carried out and what they found|
How are scientific data shared?
Even when not working through a pandemic, researchers are used to sharing their experimental data through public databases, such as the ones maintained by EMBL’s European Bioinformatics Institute (EMBL-EBI) in Europe or the National Center for Biotechnology Information (NCBI) in the United States. By making their data public, scientists enable other researchers to perform their own analyses and explore a wider range of questions. Apart from allowing more researchers to make more discoveries, the replication of scientific findings by independent research groups is an essential part of accumulating knowledge.
Since the start of the pandemic, one of the greatest challenges for the scientific community has been that new data about the virus are coming from hundreds of laboratories around the world, so bringing them together is a challenge. In addition, these data types are usually stored in dedicated databases, so creating a “one-stop-shop” for multiple data types requires a large effort and much collaboration.
Several such data-sharing platforms existed before the current pandemic, including GISAID (Global Initiative on Sharing All Influenza Data) which launched in 2008. Created with the purpose of improving the sharing of influenza data, GISAID plays an essential role in data sharing among WHO Collaborating Centres and National Influenza Centres. GISAID has become one of the best-known and well-used scientific data sharing platforms today.
This year, several new data platforms have emerged that are specifically designed to hold information about the new coronavirus. One of these is the COVID-19 Data Portal, launched by EMBL’s European Bioinformatics Institute. It is an online collaborative space for researchers and healthcare professionals working to understand the virus and the pandemic.
Funded by the European Commission and created in collaboration with multiple European partners, the portal allows researchers from all over the world to submit their COVID-19 data, and to access other publicly available data. The portal also offers data analysis and visualisation tools. The idea is that this portal and other scientific collaborations like it will speed up the race to understand the virus and develop diagnostic tools, medical treatments and vaccines.
What can we learn from data sharing?
Scientific progress relies on creating comprehensive databases and data analysis tools that allow researchers to turn data into knowledge.
For example, when the new coronavirus was first identified in China, it took scientists only a few days to sequence its genome and publish the results for everyone to use. The speed of the analysis, and its worldwide publication, shows the advances in sequencing technology that have occurred since the SARS outbreak in 2003. At that time, it took almost three months to sequence the virus’s genome.
The genome is the virus’s entire genetic code; it contains clues about how the virus evolves, how it spreads and how it can be treated. The speedy publication of the new coronavirus genome gave the world an advantage in the race against the pathogen, for example bringing forward vaccine testing.
Data sharing also aids drug discovery. For example, one international group of researchers analysed 26 of the 29 proteins that have been found in the new coronavirus. By comparing their findings against public databases of existing drugs, and with drugs in clinical trials or preclinical compounds, they identified 69 drug candidates that might act on the virus, and which have the potential to treat infection and illness. The researchers are currently evaluating the efficacy of these drugs in clinical trials.
Although data sharing is a norm in the life sciences, there are several reasons why information and materials are not always made widely available. These include concerns about the security of personal data and about the ownership of patents. In addition, there are sometimes government restrictions on the export of patient samples, or simply a lack of technology, expertise and time needed to share information rapidly in an emergency. But a pandemic presents strong incentives to overcome these obstacles – because collaboration and cooperation are the best way to face a common enemy.
More sharing, faster progress
Much of what we know about the new coronavirus has been learned by sharing scientific data. Strengthening international collaboration and creating scientific data sharing platforms is central to understanding and combatting the virus SARS-CoV-2 and the disease COVID-19.