Challenges Observed & Lessons Learned
Over the past several months, the world has dramatically changed. The normal and customary way in which we live, work, socialize and interact was altered – possibly forever. However, while man is resilient and innovative and will surely overcome this challenge, the pandemic has sown long-term damage to parts of the global economy, individuals’ economic security and, of course, families coping with the loss of loved ones.
Across the globe, brilliant healthcare professionals and scientists are scouring massive data sets looking for answers. However, that data is lacking in consistency and is often poorly captured, shared and aggregated. Further complicating the challenge is the disparate data and probability models that have provided far from accurate predictions and eroded public trust. For example, in May, the FDA released a public alert that some of the early testing data previously believed to be accurate, may be flawed, creating problems for the success of predictive modeling. This pandemic started to change our way of life over four months ago, yet we still don’t have the true, accurate and widely trusted predictions on how we beat COVID-19.
From a data collection, aggregation and analytics perspective, it is important to look at the challenges facing the medical and scientific community, the root causes and some of the lessons learned to date. This assessment covers three main areas that have affected the way the data was recorded and reported, shared and used to fuel predictive models.
Issues and Challenges arising from Varying Data Models
Countries across the globe, including the U.S., used (and continue to use) vastly different data models. This disparity has led to huge swings in the projected number of COVID-19 deaths. Having such divergent data that was not properly aggregated and cleansed meant that the earliest predictive insights were misleading, and the overall integrity was questionable – especially when comparing numbers from country to country.
Even within the U.S., a plethora of predictive models have emerged from state and institutional levels, often generating conflicting interpretations about the course of the virus. This can be particularly problematic when these models are used to inform policy-making. During this pandemic, we are facing the irony that an abundance of data is producing a shortage of genuine insight when it comes to understanding and accurately representing the COVID-19 threat to the general public.
- Artificial Intelligence (AI) has great potential for use in disease modeling, and several independent research projects that leverage AI have emerged as the world urgently seeks solutions to COVID-19. A critical component of accurate modeling is having complete data, and this continues to prove elusive during this pandemic, as numbers and studies reach conflicting conclusions leaving many wondering what to believe.
- Compounding these issues is a general lack of governance in terms of what data is being collected and how deaths are being reported. A huge barrier to our full understanding of COVID-19 comes from under reporting: of actual cases, of asymptomatic cases, and of virus-attributed deaths. This is due in part to a lack of common global reporting standards. ABC News recently outlined the need for governance and guidelines: “Experts are urging leaders to take measures right now to preserve data and medical specimens so that science has the chance to determine the precise number of people who succumbed during one of the most severe global pandemics in memory.”
Geographic and Cultural Differences
The varying ways countries report death rates could result in incomplete and incorrect data. An obvious example would be that many religions do no permit autopsies. In some cultures, family members may opt not to inform authorities of a death or hide the true cause of death to avoid retribution or community ostracism. Furthermore, some locations were either hit so hard or so quickly (e.g. New York City) that correctly identifying and recording the cause of death was complicated by the sheer numbers of cases and the speed required to process those unfortunate people who did not survive the disease. The strain on the medical community was severe as they were simultaneously diagnosing and caring for critically ill patients while working hard to keep accurate count and detailed notes of cases, hospital admissions, deaths, etc. In an interview with NPR, Maggie Koerth, a senior science writer for FiveThirtyEight, called out all of these COVID-19 issues as being a significant reason for problematic and flawed data.
- At the center of some of the confusion sits the federal guidance for those filling out death certificates. The guidance specifies: “COVID-19 should be reported on the death certificate for all deceased where the disease caused, or is assumed to have caused, or contributed to the death.” This was the guidance issued in the U.S., but more than 80 countries around the world did not and do not comply with the same guidance. Varying rules and regulations create a near impossible task of aggregating the numbers to get a true sense of the magnitude of the global crisis.
- Other events impacted reporting to further complicate the accuracy of local and therefore aggregate data. One example is the religious holiday of Ramadan which started on Saturday, April 25 this year. During Ramadan, Saudi Arabia watched the Kingdom’s numbers quadruple, with nearly 60,000 confirmed cases, making it the Arab world’s hotspot for infection. With so many cases in such a compressed period, it would be extremely difficult to determine if each death was directly related to the virus – thus further skewing the numbers.
Clearly, we know that the data flowing from some regimes on the global stage is less than accurate and complete. How do we know and how can we validate and account for skewed global numbers based on how Russia, Iran and some other nations are reporting? For instance, in an article dated March 23 published in The World, Russia reported relatively few cases of COVID-19 compared with other nations — less than 500 confirmed cases for a population of around 144.5 million. The article goes on to share how many Russian citizens doubt the government’s numbers. Some have claimed Russia has been underreporting the instances of COVID-19 and classifying cases as pneumonia. Why? How does this benefit Russia and what impact does it have on the global tracking of the pandemic? Political motivations and optics are the likely culprits for Russian underreporting or misreporting new cases of the virus. And they are not alone. With the recent decline in the price of oil and the value of the ruble declining, Russia has been pushed further into a financial corner. With true, accurate and trusted reporting (of much higher numbers) it stands to reason that the Russian government would have to mirror the shutdown tactics taken by many other countries – thus further injuring the economy and weakening the President’s position in the eyes of the Russian public. The Russian numbers are open to serious question – as are those in China and Iran. This makes calculating the true scale of the pandemic much more difficult.
Refining the way in which we collect, share and model data and socialize the insights is critical to improving the way we make data driven decisions during any future crisis. Questions of data integrity, credibility and concerns about privacy and security are now at the forefront of public discourse in a way they were not just four months ago. This whole data ecosystem must learn quickly and work arduously to regain public trust and help manage future global crises should one occur.
Advanced analytics with traceability and open frameworks point to a future that can deliver progress AND satisfy a long list of concerned stakeholders. AI and Machine Learning (ML) offer improved statistical and scientific methods – like classification and demographic segmentation techniques – to train the respective data to adjust for historical bias of reporting, regional discrepancies in data quality, cultural and political challenges. These techniques enhance the accuracy with which we assign the appropriate weighting for data sets of different groups, taking into consideration all identified data bias and highlighting where discrepancies may lie. These emerging breakthroughs, while never perfect, are taking us rapidly towards predictions that will be far more accurate than what we have seen during this current global crisis.
I would be delighted to receive your feedback on any of the issues I have raised in this article or share information specific to Ness’s AI/ML service offerings and successful engagements helping clients realize optimal value from our ML Acceleration Framework.