< Back to Blog

Data Analytics and Privacy in the Time of COVID-19

Attempting to track and mitigate the impacts of a global pandemic is proving to be like looking for the proverbial needle in a stack of other needles. Fortunately, we have made huge strides recently in data identification, collection and integration and new tools and processes (including Artificial Intelligence) have improved the speed and precision needed to reveal the critical insights. Thanks to these technological innovations, data analytics is playing a pivotal role in mankind’s battle against COVID-19.

Some memorable examples of these valuable data driven insights:

  • Kaggle, the data scientist community website, has posted a comprehensive dataset on COVID-19 infections, deaths and recoveries, as well as a collection of scientific papers on COVID-19 and related viral diseases. Data scientists around the globe have been analyzing this data to look for insights that can help stop the disease. The Kaggle community’s findings to date include observations about risk factors, symptoms, seasonality, virus persistence, the incubation period and diagnostics that provide health researchers with valuable corroborative data.
  • Researchers are using Google search trends to predict regions where COVID-19 outbreaks are about to occur. As the virus incubates in a given region, local citizens begin to observe symptoms such as loss of smell, joint aches and a dry cough. Those people often turn to Google to search for more information on their health. By tracking regionally aggregated Google searches, the researchers can predict areas where the disease is spreading, even in the absence of medical testing. They were also able to determine a previously unidentified COVID-19 symptom based on Google search patterns: eye pain.
  • Google and Apple have teamed up to create a contact-tracing smartphone app that can determine who has been in contact with an infected person, while still preserving privacy for the individual. Suppose a government requires its citizens to install the app, or requires its cellular carriers to automatically install the app. As a smartphone’s owner moves around, the app connects with nearby smartphones via Bluetooth radio, which has a range of around 30 feet, and stores the nearby phones’ IDs in the local smartphone storage, rather than a central server. When someone tests positive for the virus, their smartphone ID is broadcast to each smartphone’s app, which can then compare it with the data stored locally in the smartphone. Thus, the app can determine whether that smartphone owner came in contact with the infected person and should therefore also be quarantined. The smartphone owner is alerted and can make the necessary lifestyle adjustments, and the app can report the diagnosis back to the medical authorities who can act to enforce the quarantine, without ever revealing any citizen’s detailed location information.
  • Data analytics can also help us understand emerging challenges within the US supply chain. For example, what really caused America’s toilet paper shortage?  It’s not that people are using more toilet paper; what’s changed is where they are using it. Home consumption has skyrocketed, since many people are sheltering in place, while commercial consumption has plummeted, creating a significant backlog of commercial toilet paper that has needed to be re-sold into a different supply chain. The message for manufacturers: there is no immediate need to manufacture more, first adjust the balance in the consumer and commercial supply chains.

These efforts are admirable because they manage to provide insights that help people, without sacrificing the privacy of the individual. The Google search trends are aggregated by region, so it is impossible to trace a specific individual’s searches. The contact tracing app’s location information remains in each individual smartphone, rather than being uploaded to a central server where it might be used to violate citizens’ privacy.

In the heat of the battle against COVID-19, not all technology solutions have been so considerate of individual privacy. For example:

  • Israel approved emergency rules allowing security agencies to use mobile phone location data to perform contact tracing. This requires the cell phone carriers to provide the government with their fine-grained data about phone location for all citizens.
  • Russia tracks urban residents’ compliance with lockdowns via a network of tens of thousands of cameras, coupled with facial recognition software that has been modified to identify people wearing face masks.

As mankind mobilizes to overcome a common threat, it is wonderful to see how much computer science in general, and data analytics in particular, has contributed to the fight. In the face of a global pandemic and unprecedented, life impacting challenges, it is all too easy to view data privacy as a luxury. However, we must be aware of the danger in such emergency concessions, lest they become the new normal. History shows that citizens’ rights, once ceded, are very difficult to restore. When there are privacy-preserving alternatives, like the Bluetooth-based local storage for contact tracing, they should be preferred over more sweeping privacy-threatening options. When there is no other option, citizens need to understand how the collected data will be protected from abuse, and when the emergency privacy exception will expire.

These same guidelines apply to companies. Enterprises that were well on their way to digital transformation before COVID-19 should see little disruption in the productivity of their employees, and little deterioration in their ability to communicate with their customers. Organizations that were behind the curve before the pandemic now find themselves rushing to set up efficient digital communication channels with their employees and their customers. But, under this pressure to update quickly, they are vulnerable to data privacy mistakes that could go on to sink their business. For example, a company that accidentally exposes its customers’ data to hackers no longer has a temporary COVID-19 problem that will pass when the virus passes; they have a much longer-lasting problem of regaining customer trust.

Similarly, some companies already had a well-defined data process and architecture in place, with the ability to automatically ingest data globally, cleanse the data at scale using Machine Learning–based tools, and then monitor, measure, adjust and reassess in near real time. Organizations that are now trying to improvise these processes on-the-fly so they can function in the new business reality, face the danger of instituting processes, and creating corporate risk, that could cripple them and their growth prospects.

Companies that need to improvise their data access, policies or architecture to cope with COVID-19 should proceed with caution, and seek advice from data experts first. If not, their short-term data problem could become a more long-term corporate viability problem.