大數據的局限性
????如果你已經聽過這個笑話,盡管打斷我:有三位統計學家去獵兔。他們發現了一只兔子。第一位統計學家率先開槍,離兔子的頭差了一英尺。第二位統計學家開槍射擊,離兔子的尾巴差了一英尺。第三位統計學家大喊道:“我們逮住它了!” ????就算你并不覺得這個笑話有多么好笑,但你卻很可能跟類似于它所描述的獵兔者的管理人員一起工作過。他們的數學水平或許無可挑剔,但可悲的是,他們在真實世界的成果毫無價值。謊言,該死的謊言。各大組織到底必須掌握什么東西,才能提高其數量分析專家產生真實價值(而不是統計幻象)的幾率?不懂數學的高管們怎樣才能確保他們不會受到“大數據”(Big Data)的蒙蔽? ????我們或許可以在塞繆爾?阿貝斯曼的著作《事實的半衰期》(The Half-Life of Facts)和內特?希爾的著作《信號與噪音》(The Signal and The Noise)中找到這些問題的精彩答案。這兩部既相互獨立、又互為補充的著作深入探索了“數據”如何變為“證據”,這么多看似高深莫測的數學模型為什么根本無法區分這兩種事物等問題。這兩本書接受、并進一步擴展了納西姆?塔勒布備受歡迎并富于洞見的著作《被隨機現象蒙蔽》(Fooled By Randomness)和《黑天鵝》(The Black Swan),以及諾貝爾獎得主丹尼爾?卡尼曼的卓越作品《思考,快與慢》(Thinking, Fast and Slow)所闡述的不確定性和數量的自我欺騙等主題。如同其先驅一樣,阿貝斯曼和希爾也寫出了不僅妙趣橫生、而且具備可操作性的作品。 ????兩位作者都引用了馬克?吐溫、威爾?羅杰斯和查爾斯?凱特林等人頗具嘲諷意味的妙語:“引領我們進入困局的并不是我們不知道的事物,而是我們知道、但不那么真實的事物。”兩人都探討了用以區分“真實”知識和“不那么真實的”知識的媒介和機制。阿貝斯曼和希爾都言之鑿鑿地聲稱,目前占據上風的是“不那么真實的”知識。處理的數據越多,受到的關注也就越多。 ????應用數學家、哈佛大學數量社會科學研究所(Harvard's Institute for Quantitative Social Science)研究員阿貝斯曼解構了“事實”的定義。對讀者頗為仁慈的一點是,他并沒有跌入后現代主義哲學的泥沼。相反,他深入探索了嚴肅的科學家如何確定他們自認為了解、與其正在研究的事物相關的事實。這種“科學計量”方式——科學如何衡量其過程和進步的科學——在確定科學家所稱的“事實”的生命周期和生態系統方面非常有幫助。通過這種方式,阿貝斯曼提出了一些有趣的問題,比如:“事實”是如何誕生的?它們通常如何復制、變異和進化?它們將在多久之后消逝? |
????Stop me if you've heard this one: Three statisticians go rabbit hunting. They spot a rabbit. The first statistician shoots. He misses the rabbit's head by a foot. The second statistician fires; misses the rabbit's tail by a foot. The third statistician cries out, "We got him!" ????Even if you don't find this joke remotely amusing, you've probably worked with exactly the kind of managerial rabbit hunters it describes. Their math may be impeccable but their real-world results, alas, are rubbish. Lies, damned lies, etc. What must organizations know to improve the odds that their quants will deliver real value instead of statistical illusions? How can stochastically innumerate executives be sure they're not being bamboozled by Big Data? ????Excellent answers can be found in Samuel Arbesman's The Half-Life of Facts and Nate Silver'sThe Signal and The Noise, two distinct but complementary efforts that explore how "data" become "evidence" and why so many sophisticated mathematical models fail so spectacularly at distinguishing the two. The books embrace and extend upon the themes of uncertainty and quantitative self-deception articulated by Nassim Taleb's popular and insightful Fooled By Randomness and The Black Swan, as well as Nobel laureate Daniel Kahneman's superiorThinking, Fast and Slow. Like their precursors, Arbesman and Silver have produced entertainingly actionable books. ????Both authors cite the cynically apt line -- variously attributed to Mark Twain, Will Rogers and Charles Kettering -- that 'It ain't so much the things we don't know that get us into trouble. It's the things we know that just ain't so." Both discuss the media and mechanisms used to distinguish between "real" knowledge and "ain't so's. Arbesman and Silver both argue persuasively that the "ain't so's" are winning. The more data you deal with, the more attention that case deserves. ????Arbesman, an applied mathematician and fellow at Harvard's Institute for Quantitative Social Science, deconstructs what it means to be a fact. He mercifully avoids getting bogged down in post-modernist philosophy. Instead he explores how serious scientists attempt to nail down what it is they think they know about what they're studying. This "scientometric" approach -- the science of how science measures its process and progress -- proves extraordinarily helpful in identifying the lifecycles and ecosystems of what scientists call "facts." This approach allows Arbesman to ask intriguing questions, such as: How are "facts" born? How do they typically replicate, mutate and evolve? How long do they take to die? ????The provocative core of Arbesman's argument is that there is a virtual physics of facts. Depending upon how they're defined and measured, 'facts' follow defined laws and trajectories. "Every day that we read the news we have the possibility of being confronted with a fact about our world that is wildly different from what we thought we knew," he writes. "…But it turns out that these rapid changes, while true phase transitions in our knowledge, are not unexpected or random. We understand how they behave in the aggregate, through the use of probability, but we can also predict these changes by searching for the slower, regular changes in our knowledge that underlie them. Fast changes in facts, just like everything else we've seen, have an order to them. One that is measureable and predictable." |
最新文章