Accounting for the Unknown in the Time of COVID-19: How Data Scientists Can Adapt

Print Friendly, PDF & Email

In this special guest feature, Jonathan Prantner, Co-founder & Chief Analytics Officer at RXA, discusses how given the fast-paced nature of the world right now, data scientists working to understand and predict business impact will need to adjust current models or build new ones to make sense of COVID-19. Jonathan is a teacher, innovator and always a wizard behind the curtain. His approach to applied mathematics has pushed analytics to the limit for over two decades. He is a celebrated thought leader and recipient of multiple data scient patents.

We are in uncharted and unpredictable territory across the globe. The last widespread pandemic was a hundred years ago and the world economy is more interconnected than at any point in history. While our collective first concern is for the health and safety of those affected by the virus, many of us are also tasked with keeping the economy running during a time of isolation, business closures and digital connection. Given the fast-paced nature of the world right now, data scientists working to understand and predict business impact will need to adjust current models or build new ones to make sense of COVID-19.

The first question that data scientists are being asked when it comes to the economy is “What does this mean for our business and how long will it last?” Data scientists are rooted in the numbers, and our models tend to look for hidden drivers and focus on the long-term trends. These models work for us in ‘normal’ times, when we can assess historic data, but they are failing to adapt to the daily changes caused by COVID-19. Standard leading economic indicators, such as new housing starts and industrial production indices, are slow to display recent fluctuations due to the cadence of their release. These indicators will see drastic swings once the reporting catches up to our current situation.  Near-term indicators, such as stock market close and short-term interest rates, are reporting an immediate impact but are extremely volatile.

This leaves many data scientists with forecasting models that expect a small down swing based on the coincident indicators but treat the current crisis as a blip. To the other extreme, some models are built to pick up the small variations in the data and are giving wild estimates based on values well outside those seen in the training set.

In order to make the most sense of this ever-changing situation, data scientists need to focus on the insights they can provide rather than explaining the reasons why the models aren’t designed to adjust to a level of uncertainty not seen in a generation. Through designing and training forecasting models, I have developed a threefold path to delivering valuable insights in the current climate: track the virus progression, measure near-term demand, and measure built-up demand.

Tracking the virus progression is something that dominates the mind of individuals regardless of their background.  Three main measures are used by business-focused data scientists to inform the COVID-19 economic impact: cases diagnosed, available hospital beds and recovered cases. Cases diagnosed is tied to the availably of testing and the spread of COVID-19. This measure provides a good proxy for the restrictions set in place on businesses and individuals in a geographic area. Available hospital beds is an indicator for how well the medical system is dealing with severe cases. The goal in flattening the curve is to keep this number as far above zero as possible, and until this rises to a comfortable level, we can expect restrictions to stay in place. Finally, as the number of recovered cases outpaces the number of new cases, this indicates that things are turning a corner for the positive.

Measuring near-term demand must be focused on actual sales and orders rather than indicators of demand. Scaling a slow-to-react forecast by the year-over-year drop in sales or orders provides a quick estimate of what may be coming in terms of demand. For those locations that rely heavily on a brick and mortar presence, measuring the in-store sales drop correlated with the three aforementioned COVID-19 indicators provides a good assessment of the impact of stay at home restrictions.

Finally, we turn our thoughts to the environment after the crisis. Great minds across the world are focused on treatment and preventions methods to reduce the humanitarian impact. For the business-focused data scientist, our role is to help our organizations forecast when their business will rebound and be ready to meet that demand. This is where indicators such as mobile movement data, website visits, or phone calls can be contrasted with the drop in sales, the expected sales without the crisis (this is where those slow to respond forecasting models still provide value) and the COVID-19 indicators that we have turned a positive corner come together. In tandem, they let us know when things are starting to rebound so our organizations can be ready to scale up appropriately once the health threat has receded, helping to lessen the mid-term economic impact. 

That is what we all need to work on right now, getting  to prepared to succeed when companies and society implements a return to work policy.

Public access datasets used by RXA:

Daily Reports, by US county, of confirmed, recovered, and active cases: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports

Positive and negative tests by U.S. State: https://covidtracking.com/data

County social distancing metrics: https://www.arcgis.com/home/item.html?id=127a52f33bc54a16852c12665b07e7e0

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind

*