|Year : 2020 | Volume
| Issue : 2 | Page : 226-231
A data-driven approach to COVID-19: Resources, policies, and best practices
Meghana Aruru1, Raanan Gurewitsch2, Sarmistha Das3, Pramit Ghosh4, Bandana Sen5, Indranil Mukhopadhyay3, Saumyadipta Pyne6
1 Program Evaluation and Research Unit, Department of Pharmacy and Therapeutics, University of Pittsburgh School of Pharmacy; Health Analytics Network, Pittsburgh, PA, USA
2 Public Health Dynamics Laboratory, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
3 Human Genetics Unit, Indian Statistical Institute, Kolkata, India
4 Department of Community Medicine, Purulia Medical College, Purulia, West Bengal, India
5 All India Institute of Hygiene and Public Health, Kolkata, India
6 Health Analytics Network; Public Health Dynamics Laboratory, Graduate School of Public Health, University of Pittsburgh; Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
|Date of Submission||15-May-2020|
|Date of Decision||15-May-2020|
|Date of Acceptance||12-Jul-2020|
|Date of Web Publication||18-Dec-2020|
Dr. Saumyadipta Pyne
Public Health Dynamics Laboratory, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA; Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA; Health Analytics Network, Pittsburgh, PA
Source of Support: None, Conflict of Interest: None
The grand scale of the COVID-19 pandemic has impacted all aspects of human life. It has revealed worldwide many systemic deficiencies in understanding, preparedness, and control of the disease. To improve the situation, a data-driven approach can guide the use of resources, aid policies, and benefit from the best practices of data acquisition, sharing, and dissemination. Public health decision-making and action depend critically on the timely availability of reliable data. In this study, we described the data types and principles that are useful for better understanding of the pandemic. We focused on public policies such as lockdown and social distancing. We observed a possible impact of change in mobility on different urban populations in the US. Finally, we discussed the potential of objective policies such as limited and local lockdown to balance the dual goals of preventing contagion while also maintaining economic stability with careful consideration for vulnerable populations.
Keywords: COVID-19, data integration and analysis, data resources, policy
|How to cite this article:|
Aruru M, Gurewitsch R, Das S, Ghosh P, Sen B, Mukhopadhyay I, Pyne S. A data-driven approach to COVID-19: Resources, policies, and best practices. BLDE Univ J Health Sci 2020;5:226-31
|How to cite this URL:|
Aruru M, Gurewitsch R, Das S, Ghosh P, Sen B, Mukhopadhyay I, Pyne S. A data-driven approach to COVID-19: Resources, policies, and best practices. BLDE Univ J Health Sci [serial online] 2020 [cited 2021 Apr 14];5:226-31. Available from: https://www.bldeujournalhs.in/text.asp?2020/5/2/226/303966
The COVID-19 pandemic's global impact is profound and as the pandemic continues - gaps in data acquisition, surveillance, analysis, and dissemination present challenges in estimating and predicting its scale and time course. As of May 8, 2020, there were 3,759,967 confirmed COVID-19 cases and 259,974 reported deaths globally. In order to handle such sudden and severe stresses to their health and other systems, many countries rely on disease surveillance mechanisms based on systematic collection and analysis of disease outbreak data. In the current digital age, a variety of digital data types are increasingly available – sometimes voluntarily, crowdsourced, and free of cost – with the potential to supplement the traditional efforts of informing and guiding public health and public policy in a more equitable, dynamic, and efficient manner.
In this study, we will begin with a description of different types of data that are relevant to the pandemic, and then outline policies and best practices that are applicable in making the data useful for guiding public health and policy. We will focus on the policy of social distancing and lockdown during the current pandemic and discuss their possible effects on the population, in general, and the disease, in particular. Finally, we recommend potential strategies that may allow policymakers to contain the disease outcomes in a more humane, efficient, and objective manner.
| Data Resources, Policies, and Best Practices|| |
The following is a nonexhaustive list of digital data types that are relevant to the present context:
- Pathogen data include viral genome, sequencing, and other data related to COVID-19. In the US, governmental repositories provide free-to-use datasets on viral genomics and other omic data and metadata to biomedical researchers
- Clinical data concerns around data privacy limit the availability of diagnostic testing (such as reverse transcription-polymerase chain reaction), serological testing (antibody titers) data, and data on the clinical profiles of individual patients. There are currently few resources providing information about clinical testing data available only at population levels
- Epidemiological data hold the key to guide policy and decision-making. Several members of the scientific community are collaborating and openly sharing data and insightful trends and visualizations,,
- Societal data are increasingly used by scientists to gain a realistic understanding of disease dynamics and predict transmission patterns in populations. Such data collection is either voluntary (participatory syndromic data) or involuntary (e.g., GPS-based mobility data)., Monitoring of real-or near-real-time responses from different aspects of public life could both help and be helped by public–private partnerships and administrative machinery.
- Environmental data provide diverse information on land use, human–animal interfaces, potential transmission routes, meteorological and ecological parameters, etc., that can lead to modeling and mapping of risk across populations and identification of disease hotspots.,
- Policy data describe actions taken by policymakers to prevent the spread of infection and contain cases. This includes data on health-care systems to enable scientists in modeling the capacity and supply chains and availability of resources to predict shortages of personnel and equipment., Further, policy data from other sectors such as agriculture, trade, law and order, transportation, and finance play key roles in determining the impact of policy actions.
While it is possible to generate huge amount of data, the same may be deemed genuinely ready to use when certain practical and reasonable guidelines pertaining to its collection, analysis, standardization, and sharing are followed. As an example of such criteria, the US National Notifiable Disease Surveillance System provides detailed steps for data stewardship with good data management practices beyond proper collection, annotation, and archival. The FAIR Data Initiative, an international consortium, outlines four foundational principles as applied to a given dataset: (1) Findability, (2) Accessibility, (3) Interoperability, and (4) Reusability. Together, the FAIR data principles serve to optimize outcomes associated with such data use as described below:
- Data sharing for public health response is a cultural phenomenon to improve knowledge discovery and benefit the society at large. Criteria for findability include (a) data are uniquely and persistently identifiable; (b) data are re-findable at any point in time, and they have rich metadata; (c) metadata are digitally actionable and allow distinction from other data; and (d) metadata are registered or indexed and are searchable
- Once found, data must be accessible in various standard and compatible formats, with permissions to freely download and integrate data as follows: (a) data are accessible through a well-defined protocol; (b) protocol is free, open, and universally implementable; (c) data are accessible upon appropriate authorization; and (d) metadata are accessible even when the data are not
- Digital disease surveillance is one of the most exciting opportunities created by big data, as it has the potential to improve timeliness, provide necessary spatial and temporal resolution, and improve access to many diverse populations. For datasets to adhere to FAIR standards, metadata should use vocabularies that follow FAIR principles, digitally readable formats, and identifiers, and allow references to other metadata. Such resources, for example, Project Tycho, are sometimes, but not often, freely available to researchers and policymakers. For health-care providers, interoperability will be a key requirement for secure exchange of electronic health information across complex clinical databases
- While datasets are constructed rapidly during emergencies, they should have metadata that are adequately described and linked with other sources along with meeting community standards toward long-term usability and sustainability. Larger repositories may not have smaller data type or datasets from low-throughput bench science. In response to this, many general data repositories ranging from institutional to global levels have emerged.,, Such efforts are ushering a newer culture of data access and equity to engage and analyze high-volume and fast-stream data in real time for informed and comprehensive responses to pandemics and epidemics.
During a crisis, access to reliable and timely data is critical to identify vulnerable groups, develop measurably effective responses, and take action to protect local communities.,, Given the scale of this pandemic, the challenge lies in integrating diverse datasets to provide actionable inferences. For integrating data collected independently by various agencies, one must ensure that the datasets are consistent and comparable, which typically require advance planning, investment, and coordination.
Technological advances allow for sophisticated linkages with well-designed warehouses and information systems. For instance, the US National Health Information Systems integrate data from a variety of information systems to provide information regarding: (a) health status of a population and the impact of social determinants of health, (b) quality of health services, and (c) resource availability to policymakers and stakeholders. The US Census bureau utilizes data linkages with external datasets such as administrative data (e.g., ethnic groups, businesses) and economic data (e.g., social vulnerability indices) to reduce costs of data collection and maximize the use of existing data for various purposes.
Unlike past pandemics, we are now in the midst of numerous high-velocity data streams – phones, social media, remote sensing satellites, internet use (online commerce, search data), surveillance cameras, and sensor networks – that provide relentless, real-time information about spatial and temporal dynamics for individuals, communities, and entire populations. Key issues pertaining to data privacy, security, and ownership must be resolved via well-defined policies, even if temporary, to facilitate seamless cooperation among the public, private, scientific, and governmental stakeholders., Data sharing probably requires a cultural shift that is distinctive in its proactive manner of enabling the scientific community to rapidly develop various models, and deploy helpful tools to utilize the full scope of the available data landscape.
| Translation from Data to Policy|| |
A variety of policies in response to the pandemic have been adopted by different countries, to varying degrees of enforcement. However, it is difficult to ascertain whether all of such policies were either suitable for those societies or were motivated by the available data about the disease at the time of their application. Their effects on the mitigation of the pandemic are becoming clearer as data keep emerging daily. In [Figure 1], we note the worldwide adoption of different common strategies in maps of (a) government stringency index based on measures such as school and workplace closures and travel bans, (b) stay-at-home requirements, (c) COVID-19 testing policy, and (d) contact tracing, as they stand on May 8, 2020. Further details on the features of the maps are described at the online resource – Our World in Data. Notably, the lockdown in Wuhan, China, the first epicenter of the disease, lasted from January 23, 2020, to April 8, 2020.
|Figure 1: Worldwide adoption of policies in response to COVID-19. Maps show country.specific (a) government stringency index based on measures such as school and workplace closures and travel bans, (b) stay-at-home requirements, (c) COVID-19 testing policy, and (d) contact tracing, as they stand on May 8, 2020. The maps are created using tools and data from OurWorldInData.org/coronavirus|
Click here to view
In India, the pandemic gave a sudden jolt to the common public life, which has gradually ground to a halt over the past 6 months. Under the initial uncertainty over the exact nature of the pathogen and the still unfolding course of the pandemic, India, like many countries in the developed or developing world, is implementing social distancing through a lockdown to contain the transmission of the virus. As we write this article, the lockdown is arguably the most effective, albeit somewhat, “blunt” weapon that any population possesses to fight this battle in the absence of any vaccine and/or tested treatment protocol. Below, we recommend how the same instrument of lockdown could be used to achieve favorable community-specific outcomes that are more efficient, equitable, and cost-effective.
As an example of the power of data to provide insights into such large-scale social interventions, let us examine the effectiveness of restricted human mobility, a hallmark of any interactive society. In the U.S., mobility data indicate how visits and lengths of stay at different places change compared to a baseline and are obtained through geolocation reporting by the GPS in smartphones. In a recent study, such raw location data were used to identify the distances traveled by a smartphone user on any given day, after excluding data that were collected for short periods (<8 h) or had few (<10) reports on a given day. A secure, aggregated, and normalized mobility index was computed using median of data for given areas in the US.
In this study, we used the mobility index data to investigate the effects of pandemic-induced state-wise stay-at-home orders, as issued by the governors of respective states, on the changes in urban mobility. We computed a t-statistic like intuitive measure of mean shift in the mobility index values ± 10 days from the respective dates on which stay-at-home orders came into effect in 12 highly COVID-19-affected US cities. [Table 1] shows these results along with the corresponding increase in COVID-19 incidence per million inhabitants over that time period of 20 days in these cities. A moderate correlation between higher reduction of mobility index following the order and lower COVID-19 incidence per million inhabitants (Pearson's coefficient: -0.42) was observed.
|Table 1: Stay-at-home orders and new cases in COVID-19-affected US cities|
Click here to view
It is reasonable to expect that a prolonged lockdown will take its toll on any society. It is almost inevitable that economic recession will follow this lockdown in every country. A 2013 Reuters report that had warned how a flu pandemic could lead to serious economic recession across the globe is now truer than ever. The stark reality of sweeping shutdowns ushered by the COVID-19 pandemic has confined more than one-third of the world's population indoors. Yet, the prevailing uncertainty of the situation calls for dynamic data-driven policies that take into account the daily changes on the ground. For example, while recent mutations to the virus have been reported, it remains to be seen whether that will worsen or actually improve the pandemic situation.
Many of the economic repercussions that followed from the global economic slowdown during 2019 have been significantly exacerbated by the ongoing pandemic. The global economy is expected to witness the worst economic downturn since the Great Depression in 1929. Economists have noted the extent of global recession assuming that the pandemic peaks in April–June and recedes in July–December. The slowdown will have variable effect on human population groups spread over diverse sociopolitical matrix across the globe. In addition to the immediate costs in terms of human lives and wealth, extended periods of lockdown would challenge economic and social stability, mental health, and behavioral risks, and may leave populations with chronic and more long-lasting vulnerabilities than could be envisaged at present.
Till date, many countries have already observed in the upward of 40 days of lockdown that have been implemented in sequential intervals of 2–3 weeks. While some countries are willing to increase the lockdown further in spite of the risk of economic downfall, economists have warned that more people might die from hunger than the virus in such a scenario. The recovery from the damages to industrial processes, such as disruptions in supply chains and migrant labor supply, may take years. Thus, it is important to design policies that take an objective yet flexible approach to containment with consideration of various local conditions that may prove to be effective in balancing the dual concerns of health and economic security amid the pandemic to some extent. In this direction, epidemiologic stochastic models may provide insights into probable trajectories of the pandemic, and thus inform multiple administrative steps toward return to normalcy.
The idea of quarantining a small group of people after an epidemic outbreak to arrest the disease dates back to the 1950s when an English statistician, M.S. Bartlett, introduced the concept of critical community size (CCS). Bartlett, proposed the idea that if the susceptible population in a small community drops below a certain threshold, then the infection would not persist after a fixed time following the outbreak, unless the disease is re-introduced from outside. This threshold was termed by Bartlett as CCS. We proposed a “Susceptible-Exposed-Infectious-Recovered” (SEIR) model that explains the COVID-19 disease dynamics to calculate region-specific expected “time to extinction” (TTE) and CCS that would essentially determine the ideal number of lockdown days (termed “temporary eradication of spread time,” [TEST]), and the size of a quarantined population. We point the interested reader to different SEIR models developed for COVID-19 as described in the online U.S. Centers for Disease Control and Prevention (CDC) resource.
Our work suggests that if people are quarantined in small groups as presented by region-specific CCS, then the disease will be contained after the corresponding TTE, and subside after TEST. Therefore, lockdown should be taken seriously in the fight against the pandemic till TEST. As in any other epidemic, COVID-19 could recur in waves, but we believe that with vigilant public health measures including adequate sanitary practices, and, hopefully, vaccination and specific antiviral treatments in future, the disease will be better managed with newer policies. For instance, the CDC has published guidelines for contact tracing via mobile apps while preserving the privacy of individuals. Based on real-time information on tracking the spread of the disease, we suggest quarantining of exposed individuals in groups as small as region-specific CCS so that the disease might be contained, unless any infection is re-introduced from outside. Moreover, such localized and limited approach of lockdown or quarantine would possibly sustain a regular flow of daily life and activities, and might salvage, at least to some extent, the overall economy from plummeting further downward.
Indeed, a simultaneous and absolute lockdown may be difficult to enforce across the length and breadth of an entire country, especially one with many diverse communities such as India. Thus, a localized and limited lockdown of objectively determined high-risk subpopulations could be an economically more viable option when compared to an indiscriminate, broadly disruptive lockdown. Its effective application should rather be like a fine healing instrument than a blunt weapon and mired in ethics and ground realities. It may also be administratively better managed as communities have their distinct needs that must be met for the lockdown to be effective. This would further imply comprehensive screening for cases and identifying the critical number of contacts. Finally, a systematic study of age-specific seroprevalence must be conducted to guide specific policies such as strategic re-opening of schools, workplaces, and public spaces.
| Discussion|| |
A century after the 1918 Spanish flu pandemic, at a 2018 meeting in Geneva, the World Health Organization warned about the possibility of a zoonotic pandemic caused by a novel pathogen, which was enigmatically called “Disease X.”  In fact, Disease X was included in its “2018 list of diseases to be prioritized under the R&D Blueprint.” Earlier, we have noted the risk of bat-borne SARS-like coronaviruses. Today, when we face such a crisis, the COVID-19 pandemic underscores an urgent need for concerted scientific and administrative actions across the globe. It cannot be confronted without full and sustained cooperation among hundreds of agencies worldwide as they share data and information with readiness and accuracy.
While it may be difficult to predict the myriad effects of any major policy during a pandemic, it is easy to emphasize the commitment to a culture of apolitical and multisectoral data-driven understanding of the pandemic. The importance of conducting surveillance with accurate figures at high spatio-temporal resolution is paramount. Accurate determination of cause of death from the virus will remain yet another serious issue. Delay or imprecision in reporting comes at a high cost, which must be averted by well-executed policies for data sharing. Standards and guidelines aimed at improving data quality and ensuring data availability for long-term use are critical. In fact, data quality control pays off in the “long run” in terms of high-quality research output.
Currently, among the most well-known resources for free data downloads is the COVID-19 Data Repository maintained by the Center for Systems Science and Engineering at Johns Hopkins University. The cumulative totals include presumptive positive cases and probable cases and deaths, which is in accordance with CDC guidelines as of April 14. Many research teams are sharing their COVID-19 data products through other generic data repositories, for example, GitHub. In addition, digital archives are making manuscripts available at early prepublication stage free of cost. Such examples, among numerous others, of concerted and proactive global collaboration and ready adoption of a data-driven scientific approach remain our best hope in the fight against the pandemic. We conclude by noting that while news of initial successes in containing the disease has appeared from some countries, it is, overall, still a long way to go.
SD, PG, and IM acknowledge the support from the NIH Fogarty International Center subaward R25 TW009717.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
World Health Organization. Coronavirus Disease. World Health Organization; 2020. p. 2633.
Pyne S, Vullikanti AK, Marathe MV. Big data applications in health sciences and epidemiology. In: Govindaraju V, Raghavan V, Rao CR, editors. Handbook of Statistics: Big Data Analytics. Vol. 33. Elsevier B.V.; 2015. p. 171-202.
Ting DS, Carin L, Dzau V, Wong TY. Digital technology and COVID-19. Nat Med 2020;26:459-61.
COVID Near You. Boston Children's Hospital and Harvard Medical School. Available from: https://covidnearyou.org/
. [Last accessed on 2020 May 17].
HOT COVID-19 Response. Humanitarian OpenStreetMap Team. Available from: https://www.hotosm.org/
. [Last accessed on 2020 May 17].
National Notifiable Diseases Surveillance System. Centers for Disease Control and Prevention. Available from: https://wwwn.cdc.gov/nndss/
. [Last accessed on 2020 May 17].
Wilkinson M, Dumontier M, Aalbersberg I, Appleton G, Axton, M, Baak A, et al
. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:1-9.
Van Panhuis W, Burke D, Cross A. Project Tycho. University of Pittsburgh. Available from:https://www.tycho.pitt.edu
. [Last accessed on 2020 May 17].
Michelsen K, Brand H, Achterberg P, Wilkinson J. Promoting better integration of health information systems: Best practices and challenges. Copenhagen WHO Reg Off Eur Health Evid Netw Synth Rep 2015:40. Available from: http://apps.who.int//iris/handle/10665/152819
. [Last accessed on 2020 May 17].
Pyne S, Ray S, Gurewitsch R, Aruru M. Transition from social vulnerability to resiliency vis-à-vis COVID-19. Stat Appl 2020;18:197-208.
Ross GJ. Parametric and nonparametric sequential change detection in R: The cpm Package. J Stat Software 2015;66:1-19. [doi: 10.18637/jss.v066.i03].
Dawood AA. Mutated COVID-19 may foretell a great risk for mankind in the future. New Microbes New Infect 2020;35:100673.
Bartlett M. Measles periodicity and community size. J R Stat Soc Ser A (Gen) 1957;120:48-70.
Bartlett M. The critical community size for measles in the United States. J R Stat Soc Ser A. 1960;123:37-44.
Das S, Ghosh P, Sen B, Pyne S, Mukhopadhyay I. Critical community size for COVID-19: A model based approach for strategic lockdown policy. Stat Appl 2020;18:181-96.
Pyne S, Lee S, McLachlan G. Nature and man: The goal of bio-security in the course of rapid and inevitable human development. J Indian Soc Agric Stat 2015;69:117-25.
Moorthy V, Henao Restrepo AM, Preziosi MP, Swaminathan S. Data sharing for novel coronavirus (COVID-19). Bull World Health Organ 2020;98:150.