In finance, they are called black swans. Those rare, extreme events that come along only once every decade or even once a century and can send markets reeling. The global coronavirus pandemic is certainly one. In data science and artificial intelligence circles, those same kind of events are known by different names: edge cases, corner cases, or “out-of-distribution” datapoints. And most A.I. systems do not cope well when confronted with them.
The coronavirus pandemic is providing a real-world test of how robust many companies’ new-fangled A.I. systems really are.Most of today’s machine learning systems need to be trained on lots of historical data. But what happens when the present suddenly stops looking like the recent past?
Most A.I.-driven trading algorithms, for instance, have only been implemented in the last five years. Their training data might not even have included the 2008 financial crisis. They almost certainly don’t include anything like the massive demand-driven shock we’re seeing across all industries right now.
So, some A.I.-driven investment strategies that were supposed to do well in all kinds of different market conditions have actually performed much worsethan expected in the past few weeks.Another example: Ocado, a popular online grocery business in the U.K., has seen traffic to its website spike four times higher than any previous peak the company has experienced in its 20-year history. In a conference call with reporters Thursday, Ocado spokesman David Shriver said so many visitors went to its website that the company’s cybersecurity software, which uses machine learning to detect aberrant behavior, assumed the site was experiencing a denial of service cyberattack and moved to block those connections. Luckily, human operations managers intervened to prevent that from happening.
What can a company do to make sure its machine learning models are able to cope with these extremes? Jay Schuren, a data scientist at DataRobot, a Boston startup that helps large corporations create and run machine learning models, has tips.
It’s vital that companies monitor their data models in real-time. For a grocery that normally sells 22 cartons of milk a minute, you want to know if you suddenly start selling 10 times that amount. Not enough businesses do this today, Schuren says.
Businesses need to be proactive about which machine learning models and which input variables within the models are most sensitive to extreme events. Anything that depends on human behavior—from electricity demand to shopping—will probably change because of Covid-19, he says.
Businesses need to think about the risks associated with different algorithms. If a system for placing ads goes haywire, that’s not good, but the consequences are a lot less severe than a system dispatching $1 million worth of products to a store that’s now shuttered due to social distancing measures.
A company’s data scientists should sit down with the business's subject-matter experts and stress-test a system in simulation: What items might customers want in a crisis? And what will happen to your supply management algorithm if you do get thousands of people wanting to purchase six months' worth of toilet paper in a week?
Data scientists can rejigger which inputs an A.I. system uses so the software might be less thrown-off by extreme variations: For instance, rather than using prices as an input variable, a model that uses the percentage change in prices instead will return to normal functioning faster.
Companies should look for proxies that might exist in their data: Does this look like what happened during Hurricane Sandy or what happened during the 1973 oil crisis?
Finally, data scientists need to think carefully about whether they want the current coronavirus extremes included in future training data. For some systems, doing so might inoculate the software from being caught off guard by a similar crisis. But in a lot of other cases, it might have the opposite effect, leading the system to falsely expect that the crisis reflects a “new normal.” All those people stockpiling toilet paper today may have so much on hand they won’t need to buy any more for months, resulting in a sudden crash in demand in the near-future that the A.I. system won’t be able to foresee, even though a human analyst would certainly expect it.
Schuren says that companies could benefit from building families of different types of machine learning models for different conditions: one type that is more economically efficient, but more fragile, that they use in normal circumstances, and another that is maybe less efficient, but also less prone to break when confronted with abnormal data, that they can fall back on during extreme events.