此類罕見的極端事件在金融界被稱為“黑天鵝”,通常每十年甚至一個世紀才會出現一次,但都會讓市場顫抖。新冠疫情顯然算得上是一次“黑天鵝”事件。在數據科學和人工智能領域,“黑天鵝”還有一些別稱,如邊界情況、極端情況、“分布外”數據點等。一旦遇到“黑天鵝”案例,多數人工智能系統都會出現難以招架的情況。
?對許多企業的新型人工智能系統而言,新冠疫情是一場實戰測試,也讓我們得以看清它們究竟有多強大。現在,大多數機器學習系統需要使用大量歷史數據進行訓練,但時局驟變時會出現什么情況呢?
比如說,多數AI驅動的交易算法都是在最近五年才投入使用。其訓練數據甚至可能都沒有囊括2008年的金融危機,而且幾乎可以肯定的是,當前的很多因素也沒有被納入其中,例如這種由需求引發的全行業大規模沖擊。
因此在過去幾周,一些本應能夠在各種市場環境中應對自如的人工智能驅動投資策略卻交出了遠不如預期的成績單。以英國熱門線上零售平臺Ocado為例,近期其網站經歷了前所未有的流量暴增,比創建20年以來的流量峰值還要高出4倍之多。在上周四與記者舉行的電話會議中,Ocado發言人大衛·什利夫表示,由于近期訪客太多,該公司使用機器學習技術來監測網絡異常的網絡安全軟件誤以為網站遭受了“拒絕服務”類型的網絡攻擊,繼而采取舉措阻止用戶訪問網站。幸運的是,運營經理通過人工干預避免了這次“誤傷”。
企業該怎么做才能讓機器學習模型應對這些極端情況?DataRobot是一家專為大型企業開發、運行機器學習模型的波士頓初創企業,該公司的數據科學家杰伊·舒倫可為您提供解決之道。
公司實時監控數據模型至關重要。如果某家雜貨店平常一分鐘賣22箱牛奶,突然遇到銷量增至10倍,肯定想知道原因。舒倫說,能做到的企業并不多。
企業要主動了解哪些機器學習模型,以及模型中的哪些輸入變量對極端事件最敏感。他說,從電力需求到購物,任何與人類行為相關的事務都可能因為新型冠狀病毒改變。
企業要考慮與不同算法相關的風險。如果投放廣告的系統出現問題,情況可不妙,但后果遠沒有系統將價值100萬美元的產品運送到因避免社交而關閉的商店嚴重。
企業里的數據科學家應該跟業務領域專家坐下來,對系統進行模擬壓力測試:出現危機時,客戶可能想要什么樣的產品?如果成千上萬顧客要在一周內采購六個月用的衛生紙,供應管理算法要如何應對?
數據科學家可以調整人工智能系統調用的程序,避免軟件因遇到極端情況崩潰。舉例來說,如果模型使用價格百分比而不是實際價格,恢復正常功能會更迅速。
公司應該尋找數據中可能存在的代理指標:現在發生的事件更接近哪個歷史事件?是颶風桑迪還是1973年石油危機?
最后,數據科學家要仔細考慮未來的訓練數據里要不要加入當前新型冠狀病毒導致的極端數據。對于某些系統,加入極端情況數據可能幫軟件避免受到類似危機的影響。但在很多情況下可能適得其反,導致系統錯誤地認為危機是一種“新常態”。囤積衛生紙的人買了太多,未來幾個月內都不用再買,所以不久的將來需求突然崩潰,這一點人類分析師肯定能預料到,但人工智能系統無法預見。
舒倫表示,公司可在不同條件下建立不同類型的機器學習模型:一種是在正常情況下使用,更經濟但更脆弱;另一種可能效率較低,但遇到異常數據時不容易崩潰,在極端事件中更可靠。(財富中文網)
譯者:梁宇
審校:夏林
此類罕見的極端事件在金融界被稱為“黑天鵝”,通常每十年甚至一個世紀才會出現一次,但都會讓市場顫抖。新冠疫情顯然算得上是一次“黑天鵝”事件。在數據科學和人工智能領域,“黑天鵝”還有一些別稱,如邊界情況、極端情況、“分布外”數據點等。一旦遇到“黑天鵝”案例,多數人工智能系統都會出現難以招架的情況。
對許多企業的新型人工智能系統而言,新冠疫情是一場實戰測試,也讓我們得以看清它們究竟有多強大?,F在,大多數機器學習系統需要使用大量歷史數據進行訓練,但時局驟變時會出現什么情況呢?
比如說,多數AI驅動的交易算法都是在最近五年才投入使用。其訓練數據甚至可能都沒有囊括2008年的金融危機,而且幾乎可以肯定的是,當前的很多因素也沒有被納入其中,例如這種由需求引發的全行業大規模沖擊。
因此在過去幾周,一些本應能夠在各種市場環境中應對自如的人工智能驅動投資策略卻交出了遠不如預期的成績單。以英國熱門線上零售平臺Ocado為例,近期其網站經歷了前所未有的流量暴增,比創建20年以來的流量峰值還要高出4倍之多。在上周四與記者舉行的電話會議中,Ocado發言人大衛·什利夫表示,由于近期訪客太多,該公司使用機器學習技術來監測網絡異常的網絡安全軟件誤以為網站遭受了“拒絕服務”類型的網絡攻擊,繼而采取舉措阻止用戶訪問網站。幸運的是,運營經理通過人工干預避免了這次“誤傷”。
企業該怎么做才能讓機器學習模型應對這些極端情況?DataRobot是一家專為大型企業開發、運行機器學習模型的波士頓初創企業,該公司的數據科學家杰伊·舒倫可為您提供解決之道。
公司實時監控數據模型至關重要。如果某家雜貨店平常一分鐘賣22箱牛奶,突然遇到銷量增至10倍,肯定想知道原因。舒倫說,能做到的企業并不多。
企業要主動了解哪些機器學習模型,以及模型中的哪些輸入變量對極端事件最敏感。他說,從電力需求到購物,任何與人類行為相關的事務都可能因為新型冠狀病毒改變。
企業要考慮與不同算法相關的風險。如果投放廣告的系統出現問題,情況可不妙,但后果遠沒有系統將價值100萬美元的產品運送到因避免社交而關閉的商店嚴重。
企業里的數據科學家應該跟業務領域專家坐下來,對系統進行模擬壓力測試:出現危機時,客戶可能想要什么樣的產品?如果成千上萬顧客要在一周內采購六個月用的衛生紙,供應管理算法要如何應對?
數據科學家可以調整人工智能系統調用的程序,避免軟件因遇到極端情況崩潰。舉例來說,如果模型使用價格百分比而不是實際價格,恢復正常功能會更迅速。
公司應該尋找數據中可能存在的代理指標:現在發生的事件更接近哪個歷史事件?是颶風桑迪還是1973年石油危機?
最后,數據科學家要仔細考慮未來的訓練數據里要不要加入當前新型冠狀病毒導致的極端數據。對于某些系統,加入極端情況數據可能幫軟件避免受到類似危機的影響。但在很多情況下可能適得其反,導致系統錯誤地認為危機是一種“新常態”。囤積衛生紙的人買了太多,未來幾個月內都不用再買,所以不久的將來需求突然崩潰,這一點人類分析師肯定能預料到,但人工智能系統無法預見。
舒倫表示,公司可在不同條件下建立不同類型的機器學習模型:一種是在正常情況下使用,更經濟但更脆弱;另一種可能效率較低,但遇到異常數據時不容易崩潰,在極端事件中更可靠。(財富中文網)
譯者:梁宇
審校:夏林
In finance, they are called black swans. Those rare, extreme events that come along only once every decade or even once a century and can send markets reeling. The global coronavirus pandemic is certainly one. In data science and artificial intelligence circles, those same kind of events are known by different names: edge cases, corner cases, or “out-of-distribution” datapoints. And most A.I. systems do not cope well when confronted with them.
The coronavirus pandemic is providing a real-world test of how robust many companies’ new-fangled A.I. systems really are.Most of today’s machine learning systems need to be trained on lots of historical data. But what happens when the present suddenly stops looking like the recent past?
Most A.I.-driven trading algorithms, for instance, have only been implemented in the last five years. Their training data might not even have included the 2008 financial crisis. They almost certainly don’t include anything like the massive demand-driven shock we’re seeing across all industries right now.
So, some A.I.-driven investment strategies that were supposed to do well in all kinds of different market conditions have actually performed much worsethan expected in the past few weeks.Another example: Ocado, a popular online grocery business in the U.K., has seen traffic to its website spike four times higher than any previous peak the company has experienced in its 20-year history. In a conference call with reporters Thursday, Ocado spokesman David Shriver said so many visitors went to its website that the company’s cybersecurity software, which uses machine learning to detect aberrant behavior, assumed the site was experiencing a denial of service cyberattack and moved to block those connections. Luckily, human operations managers intervened to prevent that from happening.
What can a company do to make sure its machine learning models are able to cope with these extremes? Jay Schuren, a data scientist at DataRobot, a Boston startup that helps large corporations create and run machine learning models, has tips.
It’s vital that companies monitor their data models in real-time. For a grocery that normally sells 22 cartons of milk a minute, you want to know if you suddenly start selling 10 times that amount. Not enough businesses do this today, Schuren says.
Businesses need to be proactive about which machine learning models and which input variables within the models are most sensitive to extreme events. Anything that depends on human behavior—from electricity demand to shopping—will probably change because of Covid-19, he says.
Businesses need to think about the risks associated with different algorithms. If a system for placing ads goes haywire, that’s not good, but the consequences are a lot less severe than a system dispatching $1 million worth of products to a store that’s now shuttered due to social distancing measures.
A company’s data scientists should sit down with the business's subject-matter experts and stress-test a system in simulation: What items might customers want in a crisis? And what will happen to your supply management algorithm if you do get thousands of people wanting to purchase six months' worth of toilet paper in a week?
Data scientists can rejigger which inputs an A.I. system uses so the software might be less thrown-off by extreme variations: For instance, rather than using prices as an input variable, a model that uses the percentage change in prices instead will return to normal functioning faster.
Companies should look for proxies that might exist in their data: Does this look like what happened during Hurricane Sandy or what happened during the 1973 oil crisis?
Finally, data scientists need to think carefully about whether they want the current coronavirus extremes included in future training data. For some systems, doing so might inoculate the software from being caught off guard by a similar crisis. But in a lot of other cases, it might have the opposite effect, leading the system to falsely expect that the crisis reflects a “new normal.” All those people stockpiling toilet paper today may have so much on hand they won’t need to buy any more for months, resulting in a sudden crash in demand in the near-future that the A.I. system won’t be able to foresee, even though a human analyst would certainly expect it.
Schuren says that companies could benefit from building families of different types of machine learning models for different conditions: one type that is more economically efficient, but more fragile, that they use in normal circumstances, and another that is maybe less efficient, but also less prone to break when confronted with abnormal data, that they can fall back on during extreme events.