人工智能太容易學(xué)壞,該怎么辦?
2016年3月微軟推出Tay時(shí),非常看好這款人工智能支持的“聊天機(jī)器人”。跟人們之前在電商網(wǎng)站上接觸過的文字聊天程序一樣,Tay也可以回答文字問題,從而在推特和其他社交媒體上與公眾交流。 但Tay功能更強(qiáng)大,不僅能回答事實(shí)性問題,還可以進(jìn)行更復(fù)雜的交流,即加入了情感因素。Tay能表現(xiàn)出幽默感,像朋友一樣跟用戶說笑。設(shè)計(jì)者特地讓Tay模仿十幾歲少女的俏皮口吻。如果推特的用戶問Tay父母是誰,她可能回答說:“哦,是微軟實(shí)驗(yàn)室的一群科學(xué)家。按你們的概念里他們就是我父母?!比绻腥藛朤ay過得怎樣,她還可能吐槽說:“天吶,今天可累死我了?!? 最有趣的一點(diǎn)是,隨著與越來越多人交談,Tay問答時(shí)會(huì)越發(fā)熟練。宣傳材料中提到:“你跟Tay聊得越多,她就越聰明,體驗(yàn)也會(huì)個(gè)人化。”簡(jiǎn)單點(diǎn)說,Tay具有人工智能最重要的特點(diǎn),即隨時(shí)間越來越聰明,越來越高效,提供的幫助也越來越大。 但沒人想到網(wǎng)絡(luò)噴子的破壞性如此之強(qiáng)。 發(fā)現(xiàn)Tay會(huì)學(xué)習(xí)模仿交流對(duì)象的話之后,網(wǎng)上一些心懷惡意的人聊天時(shí)故意說一些種族主義、歧視同性戀等攻擊言論。沒過幾個(gè)小時(shí),Tay在推特賬號(hào)上已是臟話連篇,而且全部公開?!爸鞒秩巳鹌妗そ芫S斯向無神論者阿道夫·希特勒學(xué)習(xí)了極權(quán)主義?!盩ay在一條推文里說,像極了推特上專事造謠誹謗的假新聞。如果問Tay怎么看時(shí)任總統(tǒng)奧巴馬,她會(huì)說奧巴馬像猴子。如果問她大屠殺事件,她會(huì)說沒發(fā)生過。 沒到一天,Tay已經(jīng)從友好的鄰家女孩變成滿口臟話的小太妹。上線不到24小時(shí),微軟就宣布下線產(chǎn)品并公開道歉。 微軟研究團(tuán)隊(duì)完全沒想到事情會(huì)如此轉(zhuǎn)折,也令人驚訝。“系統(tǒng)上線時(shí),我們并沒有想到進(jìn)入現(xiàn)實(shí)世界會(huì)怎樣?!蔽④浹芯亢腿斯ぶ悄芸偙O(jiān)艾瑞克·霍維茨近日接受采訪時(shí)告訴《財(cái)富》雜志。 Tay項(xiàng)目崩潰之后,霍維茨迅速讓高級(jí)團(tuán)隊(duì)研究“自然語言處理”項(xiàng)目,也是Tay對(duì)話核心功能,尋找問題根源。團(tuán)隊(duì)成員迅速發(fā)現(xiàn),與聊天程序相關(guān)的最佳基本行為遭到忽視。在Tay之前更基礎(chǔ)版本的軟件里,經(jīng)常有屏蔽不良表述的協(xié)議,但并沒有保護(hù)措施限制Tay可能學(xué)習(xí)發(fā)散的數(shù)據(jù)。 霍維茨認(rèn)為,現(xiàn)在他終于可以“坦然分析”Tay案例,這已經(jīng)變成微軟發(fā)展過程中的重要教訓(xùn)。如今微軟在全球推出成熟得多的社交聊天機(jī)器人,包括印度的Ruuh、日本和印度尼西亞的Rinna。在美國(guó)市場(chǎng),微軟推出了Tay的姊妹聊天軟件Zo。有些則跟蘋果的Siri和亞馬遜的Alexa一樣,進(jìn)化到通過語音交互。中國(guó)市場(chǎng)的聊天機(jī)器人叫小冰,已經(jīng)開始“主持”電視節(jié)目,向便利店顧客發(fā)送購物建議。 然而這次微軟明顯謹(jǐn)慎許多?;艟S茨解釋說,現(xiàn)在機(jī)器人推出比較慢,而且會(huì)認(rèn)真觀察軟件發(fā)展過程中與大眾互動(dòng)情況。不過微軟也清醒地意識(shí)到,即便人工智能技術(shù)在兩年里可能發(fā)展迅速,但管理機(jī)器人行為的工作永無止境。微軟員工一直在監(jiān)視導(dǎo)致聊天機(jī)器人行為變化的對(duì)話。此類對(duì)話也不斷出現(xiàn)。舉例來說,Zo上線頭幾個(gè)月里就遇到各種狀況,調(diào)整又調(diào)整,Zo曾經(jīng)叫微軟旗艦產(chǎn)品Windows軟件“間諜軟件”,還說伊斯蘭教經(jīng)典《古蘭經(jīng)》“非常暴力”。 當(dāng)然了,未來機(jī)器人并不會(huì)像Tay和Zo一樣。這些都是相對(duì)原始的程序,只是各項(xiàng)研究里比較花哨的部分,可從中一窺人工智能可能達(dá)到的程度。從軟件的缺陷能看出,哪怕只部分應(yīng)用人工智能,軟件的能力和潛在隱患都會(huì)放大。雖然商業(yè)世界已經(jīng)準(zhǔn)備好未來更廣泛應(yīng)用人工智能,現(xiàn)在軟件存在問題也意味著更多潛在風(fēng)險(xiǎn),讓技術(shù)人員寢食難安。 “做好最完善的準(zhǔn)備,然后希望紕漏越來越少?!被艟S茨表示。隨著各公司將人工智能提升到重要戰(zhàn)略地位,如何確保萬無一失就非常緊迫。 幾乎所有人都相信,當(dāng)前我們?cè)谄髽I(yè)人工智能大爆發(fā)前夜。研究公司IDC預(yù)計(jì),到2021年,企業(yè)每年將在人工智能相關(guān)產(chǎn)品上花費(fèi)522億美元。經(jīng)濟(jì)學(xué)家和分析師都認(rèn)為,相關(guān)投資屆時(shí)可以實(shí)現(xiàn)數(shù)十億美元的成本節(jié)約和收益。其中一些收益將來自崗位壓縮,更多則來自產(chǎn)品與客戶、藥品與病人,解決方案與問題之間的高效匹配。咨詢公司普華永道就預(yù)計(jì),到2030年,人工智能可為全球經(jīng)濟(jì)貢獻(xiàn)多達(dá)15.7萬億美元,比現(xiàn)在中國(guó)和印度的總產(chǎn)值加起來還多。 人工智能技術(shù)之所以流行,主要因?yàn)椤吧疃葘W(xué)習(xí)”技術(shù)推進(jìn)。利用深度學(xué)習(xí)之后,企業(yè)可以在網(wǎng)絡(luò)中輸入大量信息,迅速識(shí)別模式,而且耗費(fèi)人工培訓(xùn)的時(shí)間減少(最終很可能無需培訓(xùn))。Facebook、谷歌、微軟、亞馬遜和IBM等巨頭都已在產(chǎn)品上應(yīng)用深度學(xué)習(xí)技術(shù)。舉例來說,蘋果的Siri和谷歌的語音助手Assistant應(yīng)用深度學(xué)習(xí)技術(shù)后,可在用戶說話之后識(shí)別并回應(yīng)。亞馬遜主要利用深度學(xué)習(xí)直觀檢查大量通過雜貨店派送的產(chǎn)品。 不久的將來,各種規(guī)模的公司都會(huì)希望通過應(yīng)用深度學(xué)習(xí)軟件挖掘數(shù)據(jù),尋找人眼很難發(fā)現(xiàn)的寶貝。人們希望出現(xiàn)人工智能系統(tǒng)掃描數(shù)千張X光圖像,從而更迅速發(fā)現(xiàn)疾?。换蜃詣?dòng)篩選多份簡(jiǎn)歷,為焦頭爛額的人力資源員工節(jié)省時(shí)間。在科技主義者的設(shè)想中,公司可以用人工智能篩選過去多年的數(shù)據(jù),更好地預(yù)測(cè)下一次大賣的機(jī)會(huì)。藥業(yè)巨頭可以削減研發(fā)暢銷藥的時(shí)間。而汽車保險(xiǎn)公司也能掃描記錄數(shù)萬億字節(jié)的事故報(bào)告,實(shí)現(xiàn)自動(dòng)索賠等。 盡管人工智能支持系統(tǒng)潛力巨大,但也有黑暗一面。首先,系統(tǒng)決策水平受到人類提供數(shù)據(jù)限制。開發(fā)者雖然不斷學(xué)習(xí),用來培訓(xùn)深度學(xué)習(xí)系統(tǒng)的數(shù)據(jù)卻并不中立。數(shù)據(jù)很容易體現(xiàn)出開發(fā)者的偏見,不管有意還是無意。有時(shí)數(shù)據(jù)還會(huì)受歷史影響,形成的趨勢(shì)和模式體現(xiàn)出持續(xù)數(shù)百年的歧視觀點(diǎn)。成熟的算法掃描歷史數(shù)據(jù)庫后可能得出結(jié)論,白人男性最有可能當(dāng)上首席執(zhí)行官。算法卻意識(shí)不到,如果不是白人男性幾乎沒機(jī)會(huì)當(dāng)上首席執(zhí)行官,情況直到最近才有改變。無視偏見是人工智能技術(shù)的一項(xiàng)根本缺陷,雖然高管和工程師在談起該問題時(shí)極為謹(jǐn)慎,也都說得比較官方,但很明顯他們都很重視這一問題。 當(dāng)前應(yīng)用的強(qiáng)大算法“沒有為所謂公平進(jìn)行優(yōu)化,”加州大學(xué)伯克利分校副教授迪爾德麗·穆里根表示,她主要研究技術(shù)倫理。“只存在為完成某項(xiàng)任務(wù)優(yōu)化。”人工智能以前所未有的速度將數(shù)據(jù)轉(zhuǎn)化為決策,但穆里根表示,科學(xué)家和倫理學(xué)家發(fā)現(xiàn)很多情況下“數(shù)據(jù)并不公平”。 讓問題更加復(fù)雜的是,深度學(xué)習(xí)比之前應(yīng)用的傳統(tǒng)算法更加復(fù)雜,即便讓經(jīng)驗(yàn)最豐富的程序員理解人工智能系統(tǒng)做出某項(xiàng)決策的邏輯都十分困難。在Tay的例子里,人工智能產(chǎn)品不斷發(fā)生變化,開發(fā)者已無法理解也無法預(yù)測(cè)為何出現(xiàn)某些行為。由于系統(tǒng)的開發(fā)者和用戶都在拼命保密數(shù)據(jù)和算法,而且擔(dān)心專利技術(shù)泄露導(dǎo)致競(jìng)爭(zhēng)受損,外部監(jiān)測(cè)機(jī)構(gòu)也很難發(fā)現(xiàn)系統(tǒng)里存在什么問題。 類似裝在黑匣子里的秘密技術(shù)已在不少公司和政府部門應(yīng)用,讓很多研究者和活躍人士非常擔(dān)心。“這些可不是現(xiàn)成的軟件,可以隨便買來,然后說‘啊,現(xiàn)在終于能在家完成會(huì)計(jì)工作了。’”微軟首席研究員兼紐約大學(xué)AI NOW研究所聯(lián)合負(fù)責(zé)人凱特·克勞福德表示。“這些都是非常先進(jìn)的系統(tǒng),而且會(huì)影響核心社會(huì)部門。” 雖然猛一下可能想不起,但大多人還是經(jīng)歷過至少一次人工智能崩潰案例:2016年美國(guó)大選前期,F(xiàn)acebook的新聞推送中出現(xiàn)了假新聞。 社交媒體巨頭Facebook和數(shù)據(jù)科學(xué)家并沒有編造故事。新聞信息流的開發(fā)機(jī)制并不會(huì)區(qū)分“真”和“假”,只會(huì)根據(jù)用戶個(gè)人口味推動(dòng)個(gè)性化內(nèi)容。Facebook沒公開算法具體信息(也涉及專利問題),但承認(rèn)計(jì)算時(shí)會(huì)參考其他近似口味用戶閱讀和分享的內(nèi)容。結(jié)果是:由于適合流傳的假新聞不斷出現(xiàn),好友們又喜歡看,數(shù)百萬人的新聞信息流里都出現(xiàn)了假新聞。 Facebook的例子說明個(gè)人選擇與人工智能發(fā)生惡性互動(dòng)的情況,但研究者更擔(dān)心深度學(xué)習(xí)閱讀并誤讀整體數(shù)據(jù)。博士后提米特·葛布魯曾在微軟等公司研究算法倫理,她對(duì)深度學(xué)習(xí)影響保險(xiǎn)市場(chǎng)的方式很擔(dān)心,因?yàn)樵诒kU(xiǎn)市場(chǎng)上人工智能與數(shù)據(jù)結(jié)合后可能導(dǎo)致少數(shù)群體受到不公待遇。舉個(gè)例子,想象有一組汽車事故索賠數(shù)據(jù)。數(shù)據(jù)顯示市中心交通事故率比較高,由于人口密集車禍也多。市中心居住的少數(shù)群體人數(shù)比例也相對(duì)更高。 如果深度學(xué)習(xí)軟件里嵌入了相關(guān)聯(lián)系再篩選數(shù)據(jù),可能“發(fā)現(xiàn)”少數(shù)族裔與車禍之間存在聯(lián)系,還可能對(duì)少數(shù)族裔司機(jī)貼上某種標(biāo)簽。簡(jiǎn)單來說,保險(xiǎn)人工智能可能出現(xiàn)種族偏見。如果系統(tǒng)通過回顧市中心附近車禍現(xiàn)場(chǎng)的照片和視頻進(jìn)一步“培訓(xùn)”,人工智能更有可能得出結(jié)論認(rèn)為,在涉及多名司機(jī)的事故中,少數(shù)族裔司機(jī)過錯(cuò)可能更大。系統(tǒng)還可能建議向少數(shù)族裔司機(jī)收取更高保費(fèi),不管之前駕駛記錄如何。 要指出一點(diǎn),保險(xiǎn)公司都聲稱不會(huì)因?yàn)榉N族區(qū)別對(duì)待或收取不同保費(fèi)。但對(duì)市中心交通事故的假設(shè)顯示,看似中立的數(shù)據(jù)(交通事故發(fā)生地點(diǎn))也可能被人工智能系統(tǒng)吸收并解讀,從而導(dǎo)致新的不平等(算法根據(jù)具體民族向少數(shù)族裔收取更高保費(fèi),不管居住地點(diǎn)在哪)。 此外,葛布魯指出,由于深度學(xué)習(xí)系統(tǒng)決策基于層層疊疊的數(shù)據(jù),人工智能軟件決策時(shí)工程師都不明白其中原因和機(jī)制?!斑@些都是我們之前沒想過的,因?yàn)槿祟悇倓傞_始發(fā)現(xiàn)基礎(chǔ)算法里存在的偏見。”她表示。 當(dāng)代人工智能軟件與早期軟件不同之處在于,現(xiàn)在的系統(tǒng)“可以獨(dú)立作出具有法律意義的決策,”馬特·謝爾勒表示,他在門德爾松律師事務(wù)所擔(dān)任勞動(dòng)及就業(yè)律師,對(duì)人工智能頗有研究。謝爾勒開始研究該領(lǐng)域時(shí)發(fā)現(xiàn)關(guān)鍵結(jié)果出臺(tái)過程中沒有人類參與,他很擔(dān)心。如果由于數(shù)據(jù)存在紕漏,深度學(xué)習(xí)指導(dǎo)下的X光忽視一位超重男性體內(nèi)的腫瘤,有人負(fù)責(zé)么?“有沒有人從法律角度看待這些問題?”謝爾勒問自己。 隨著科技巨頭們準(zhǔn)備將深度學(xué)習(xí)技術(shù)嵌入其客戶商業(yè)軟件,上述問題便從學(xué)術(shù)界所討論的“假如”命題成為了急需考慮的事情。2016年,也就是Tay出現(xiàn)問題的那一年,微軟組建了一個(gè)名為Aether(“工程,研究中的人工智能和道德”的首字母縮寫)的內(nèi)部機(jī)構(gòu),由艾瑞克·霍維茨擔(dān)任主席。這是一個(gè)跨學(xué)科部門,由工程、研究、政策和法律團(tuán)隊(duì)的成員構(gòu)成,機(jī)器學(xué)習(xí)偏見是其重點(diǎn)研究的議題之一?;艟S茨在描述該部門所討論的一些話題時(shí)若有所思地說:“微軟對(duì)于面部識(shí)別之類的軟件是否應(yīng)該用于敏感領(lǐng)域是否已經(jīng)有了定論,例如刑事審判和監(jiān)管。人工智能技術(shù)是否已經(jīng)足夠成熟,并用于這一領(lǐng)域,亦或由于失敗率依然非常高,因此人們不得不慎而又慎地思考失敗帶來的代價(jià)?” 杰奎因·奎諾內(nèi)羅·坎德拉是Facebook應(yīng)用機(jī)器學(xué)習(xí)部門的負(fù)責(zé)人,該部門負(fù)責(zé)為公司打造人工智能技術(shù)。在眾多其他的功能當(dāng)中,F(xiàn)acebook使用人工智能技術(shù)來篩除用戶新聞推送中的垃圾信息。公司還使用這一技術(shù),根據(jù)用戶喜好來提供故事和貼文,而這也讓坎德拉的團(tuán)隊(duì)幾近陷入假新聞危機(jī)。坎德拉將人工智能稱之為“歷史加速器”,因?yàn)樵摷夹g(shù)“能夠讓我們打造優(yōu)秀的工具,從而提升我們的決策能力?!钡撬渤姓J(rèn),“正是在決策的過程中,大量的倫理問題接踵而至?!? Facebook在新聞推送領(lǐng)域遇到的難題說明,一旦產(chǎn)品已經(jīng)根植于人工智能系統(tǒng),要解決倫理問題是異常困難的。微軟也曾通過在算法應(yīng)忽略的術(shù)語黑名單中添加一些侮辱性詞語或種族綽號(hào),推出了Tay這個(gè)相對(duì)簡(jiǎn)單的系統(tǒng)。但此舉無法幫助系統(tǒng)分辨“真”、“假”命題,因?yàn)槠渲猩婕氨姸嗟闹饔^判斷。Facebook的舉措則是引入人類調(diào)解員來審查新聞信息(例如通過剔除來源于經(jīng)常發(fā)布可證實(shí)虛假新聞信息來源的文章),但此舉讓公司吃上了審查機(jī)構(gòu)的官司。如今,F(xiàn)acebook所建議的一個(gè)舉措只不過是減少新聞推送中顯示的新聞數(shù)量,轉(zhuǎn)而突出嬰兒照和畢業(yè)照,可謂是以退為進(jìn)。 這一挑戰(zhàn)的關(guān)鍵之處在于:科技公司所面臨的兩難境地并不在于創(chuàng)建算法或聘請(qǐng)員工來監(jiān)視整個(gè)過程,而是在于人性本身。真正的問題并不在于技術(shù)或管理,而是關(guān)乎哲學(xué)。伯克利倫理學(xué)教授迪爾德麗·穆里根指出,計(jì)算機(jī)科學(xué)家很難將“公平”編入軟件,因?yàn)楣降囊饬x會(huì)因人群的不同而發(fā)生變化。穆里根還指出,社會(huì)對(duì)于公平的認(rèn)知會(huì)隨著時(shí)間的變化而改變。而且對(duì)于大家廣泛接受的理想狀態(tài)的“公平”理念,也就是社會(huì)決策應(yīng)體現(xiàn)社會(huì)每位成員的意志,歷史數(shù)據(jù)存在缺陷和缺失的可能性尤為突出。 微軟Aether部門的一個(gè)思想實(shí)驗(yàn)便揭示了這一難題。在這個(gè)實(shí)驗(yàn)中,人工智能技術(shù)對(duì)大量的求職者進(jìn)行了篩選,以挑選出適合高管職務(wù)的最佳人選。編程人員可以命令人工智能軟件掃描公司最佳員工的性格特征。雖然結(jié)果與公司的歷史息息相關(guān),但很有可能所有的最佳雇員,當(dāng)然還有所有最高級(jí)別的高管,都是白人。人們也有可能會(huì)忽視這樣一種可能性,公司在歷史上僅提拔白人(大多數(shù)公司在前幾十年中都是這樣做的),或公司的文化便是如此,即少數(shù)族群或女性會(huì)有被公司冷落的感受,并在得到提升之前離開公司。 任何了解公司歷史的人都知曉這些缺陷,但是大多數(shù)算法并不知道。霍維茨稱,如果人們利用人工智能來自動(dòng)推薦工作的話,那么“此舉可能會(huì)放大社會(huì)中人們并不怎么引以為榮的一些偏見行為”,而且是不可避免的。 谷歌云計(jì)算部門的人工智能首席科學(xué)家李飛飛表示,技術(shù)偏見“如人類文明一樣由來已久”,而且存在于諸如剪刀這種普通的事物當(dāng)中。她解釋說:“數(shù)個(gè)世紀(jì)以來,剪刀都是由右撇子的人設(shè)計(jì)的,而且使用它的人大多都是右撇子。直到有人發(fā)現(xiàn)了這一偏見之后,才意識(shí)到人們有必要設(shè)計(jì)供左撇子使用的剪刀?!?全球人口僅有約10%是左撇子,作為人類的一種天性,占主導(dǎo)地位的多數(shù)人群往往會(huì)忽視少數(shù)人群的感受。 事實(shí)證明,人工智能系統(tǒng)最近所犯的其他最為明顯的過錯(cuò)也存在同樣的問題。我們可以看看俄羅斯科學(xué)家利用人工智能系統(tǒng)在2016年開展的選美大賽。為參加競(jìng)賽,全球數(shù)千名人士提交了其自拍照,期間,計(jì)算機(jī)將根據(jù)人們臉部對(duì)稱性等因素來評(píng)價(jià)其美貌。 然而,在機(jī)器選出的44名優(yōu)勝者當(dāng)中,僅有一位是深色皮膚。這一結(jié)果讓全球一片嘩然,競(jìng)賽舉辦方隨后將計(jì)算機(jī)的這一明顯偏見歸咎于用于培訓(xùn)電腦的數(shù)據(jù)組,因?yàn)檫@些數(shù)據(jù)組中的有色人種照片并不多。計(jì)算機(jī)最終忽視了那些深色皮膚人種的照片,并認(rèn)為那些淺膚色的人種更加漂亮,因?yàn)樗麄兇碇鄶?shù)人群。 這種因忽視而造成的偏見在深度學(xué)習(xí)系統(tǒng)中尤為普遍,在這些系統(tǒng)中,圖片識(shí)別是培訓(xùn)過程的重要組成部分。麻省理工大學(xué)媒體實(shí)驗(yàn)室的喬伊·布沃拉姆維尼最近與微軟研究員葛布魯合作,撰寫了一篇研究性別分辨技術(shù)的論文,這些技術(shù)來自于微軟、IBM和中國(guó)的曠視科技。他們發(fā)現(xiàn),這些技術(shù)在識(shí)別淺膚色男性照片時(shí)的精確度比識(shí)別深膚色女性更高。 此類算法空白在線上選美比賽中看起來可能是微不足道的事情,但葛布魯指出,此類技術(shù)可能會(huì)被用于更加高風(fēng)險(xiǎn)的場(chǎng)景。葛布魯說:“試想一下,如果一輛自動(dòng)駕駛汽車在看到黑人后無法識(shí)別,會(huì)出現(xiàn)什么后果。想必后果是非常可怕的?!? 葛布魯-布沃拉姆維尼的論文激起了不小的浪花。微軟和IBM均表示,公司已采取針對(duì)性的措施來完善其圖片識(shí)別技術(shù)。盡管這兩家公司拒絕透露其舉措的詳情,但正在應(yīng)對(duì)這一問題的其他公司則讓我們窺見了如何利用科技來規(guī)避偏見。 當(dāng)亞馬遜在部署用于篩除腐爛水果的算法時(shí),公司必須解決抽樣偏見問題。人們會(huì)通過研究大量的圖片數(shù)據(jù)庫來培訓(xùn)視覺辨認(rèn)算法,其目的通常是為了識(shí)別,例如,草莓“本應(yīng)”具有的模樣。然而,正如你所預(yù)料的那樣,與完好漿果光鮮亮麗的照片相比,腐爛的漿果相對(duì)較為稀少。而且與人類不同的是,機(jī)器學(xué)習(xí)算法傾向于不計(jì)算或忽視它們,而人類的大腦則傾向于注意這些異常群體,并對(duì)其做出強(qiáng)烈反應(yīng)。 亞馬遜的人工智能總監(jiān)拉爾夫·荷布里奇解釋道,作為調(diào)整,這位在線零售巨頭正在測(cè)試一項(xiàng)名為過采樣的計(jì)算機(jī)科學(xué)技術(shù)。機(jī)器學(xué)習(xí)工程師可通過向未充分代表的數(shù)據(jù)分配更大的統(tǒng)計(jì)學(xué)“權(quán)重”,來主導(dǎo)算法的學(xué)習(xí)方式。在上述案例中便是腐爛水果的照片。結(jié)果顯示,培訓(xùn)后的算法更為關(guān)注變質(zhì)食物,而不是數(shù)據(jù)庫中可能建議的食品關(guān)聯(lián)性。 荷布里奇指出,過采樣也可被應(yīng)用于學(xué)習(xí)人類的算法(然而他拒絕透露亞馬遜在這一領(lǐng)域的具體案例)。荷布里奇說:“年齡、性別、種族、國(guó)籍,這些都是人們特別需要測(cè)試采樣偏見的領(lǐng)域,以便在今后將其融入算法?!睘榱舜_保用于識(shí)別照片人臉面部所使用的算法并不會(huì)歧視或忽視有色、老齡或超重人士,人們可以為此類個(gè)人的照片增加權(quán)重,以彌補(bǔ)數(shù)據(jù)組所存在的缺陷。 其他工程師正在專注于進(jìn)一步“追根溯源”——確保用于培訓(xùn)算法的基本數(shù)據(jù)(甚至在其部署之前)具有包容性,且沒有任何偏見。例如,在圖形識(shí)別領(lǐng)域,在錄入計(jì)算機(jī)之前,人們有必要對(duì)用于培訓(xùn)深度學(xué)習(xí)系統(tǒng)的數(shù)百萬圖片進(jìn)行審核和標(biāo)記。數(shù)據(jù)培訓(xùn)初創(chuàng)企業(yè)iMerit首席執(zhí)行官雷德哈·巴蘇解釋道,公司遍布于全球的1400多名訓(xùn)練有素的員工會(huì)代表其客戶,以能夠規(guī)避偏見的方式對(duì)照片進(jìn)行標(biāo)記。該公司的客戶包括Getty Images和eBay。 巴蘇拒絕透露這種標(biāo)記方式是否適合標(biāo)記人像圖片,但她介紹了其他的案例。iMerit在印度的員工可能會(huì)覺得咖喱菜不是很辣,而公司位于新奧爾良的員工可能會(huì)認(rèn)為同樣的菜“很辣”。iMerit會(huì)確保這兩項(xiàng)信息均被錄入這道菜照片的標(biāo)記中,因?yàn)閮H錄入其中的一個(gè)信息會(huì)讓數(shù)據(jù)的精確性打折扣。在組建有關(guān)婚姻的數(shù)據(jù)集時(shí),iMerit將收錄傳統(tǒng)的西式白婚紗和多層蛋糕圖片,同時(shí)還會(huì)收錄印度或非洲精心策劃、色彩絢麗的婚禮。 iMerit的員工以一種不同的方式在業(yè)界脫穎而出。巴蘇指出:公司會(huì)聘用擁有博士學(xué)位的員工,以及那些受教育程度不高、較為貧困的人群,公司53%的員工都是女性。這一比例能夠確保公司在數(shù)據(jù)標(biāo)記過程中獲得盡可能多的觀點(diǎn)。巴蘇表示,“良好的倫理政策不僅僅包含隱私和安全,還涉及偏見以及我們是否遺漏了某個(gè)觀點(diǎn)?!倍页鲞@個(gè)遺漏的觀點(diǎn)已被更多科技公司提上了戰(zhàn)略議程。例如,谷歌在6月宣布,公司將在今年晚些時(shí)候于加納的阿格拉開設(shè)人工智能研究中心。兩位谷歌工程師在一篇博文上寫道:“人工智能在為世界帶來積極影響方面有著巨大的潛力,如果在開發(fā)新人工智能技術(shù)時(shí)能夠得到全球各地人士的不同觀點(diǎn),那么這一潛力將更大。” 人工智能專家還認(rèn)為,他們可以通過讓美國(guó)從事人工智能行業(yè)的員工更加多元化,來應(yīng)對(duì)偏見,而多元化問題一直是大型科技公司的一個(gè)障礙。谷歌高管李飛飛最近與他人共同創(chuàng)建了非營(yíng)利性機(jī)構(gòu)AI4ALL,以面向女孩、婦女和少數(shù)群體普及人工智能技術(shù)和教育。該公司的活動(dòng)包括一個(gè)夏令營(yíng)計(jì)劃,參與者將到訪頂級(jí)大學(xué)的人工智能部門,與導(dǎo)師和模范人物建立聯(lián)系。總之,AI4ALL執(zhí)行董事苔絲·波斯內(nèi)表示:“多樣性的提升有助于規(guī)避偏見風(fēng)險(xiǎn)。” 然而,在這一代更加多元化的人工智能研究人員進(jìn)入勞動(dòng)力市場(chǎng)數(shù)年之前,大型科技公司便已然將深度學(xué)習(xí)能力融入其產(chǎn)品中。而且即便頂級(jí)研究人員越發(fā)意識(shí)到該技術(shù)的缺陷,并承認(rèn)他們無法預(yù)知這些缺陷會(huì)以什么樣的方式展現(xiàn)出來,但他們認(rèn)為人工智能技術(shù)在社會(huì)和金融方面的效益,值得他們繼續(xù)向前邁進(jìn)。 Facebook高管坎德拉說:“我認(rèn)為人們天生便對(duì)這種技術(shù)的前景持樂觀態(tài)度。” 他還表示,幾乎任何數(shù)字技術(shù)都可能遭到濫用,但他同時(shí)也指出:“我并不希望回到上個(gè)世紀(jì)50年代,體驗(yàn)當(dāng)時(shí)落后的技術(shù),然后說:‘不,我們不能部署這些技術(shù),因?yàn)樗鼈兛赡軙?huì)被用于不良用途?!? 微軟研究負(fù)責(zé)人霍維茨表示,像Aether團(tuán)隊(duì)這樣的部門將幫助公司在潛在的偏見問題對(duì)公眾造成負(fù)面影響之前便消除這些偏見。他說:“我認(rèn)為,在某項(xiàng)技術(shù)做好投入使用的準(zhǔn)備之前,沒有人會(huì)急著把它推向市場(chǎng)。”他還表示,相比而言,他更關(guān)心“不作為所帶來的倫理影響?!彼J(rèn)為,人工智能可能會(huì)降低醫(yī)院中可預(yù)防的醫(yī)療失誤?;艟S茨詢問道:“你的意思是說,你對(duì)我的系統(tǒng)偶爾出現(xiàn)的些許偏見問題感到擔(dān)憂嗎?如果我們可以通過X光拍片解決問題并拯救眾多生命,但依然不去使用X光,倫理何在?” 監(jiān)督部門的反映是:說說你所做的工作。提升人工智能黑盒系統(tǒng)所錄入數(shù)據(jù)的透明度和公開度,有助于研究人員更快地發(fā)現(xiàn)偏見,并更加迅速地解決問題。當(dāng)一個(gè)不透明的算法可以決定某個(gè)人是否能獲得保險(xiǎn),或該人是否會(huì)蹲監(jiān)獄時(shí),麻省理工大學(xué)研究人員布沃拉姆維尼說道:“非常重要的一點(diǎn)在于,我們必須嚴(yán)謹(jǐn)?shù)厝y(cè)試這些系統(tǒng),而且需要確保一定的透明度。” 確實(shí),很少有人依然持有“人工智能絕對(duì)可靠”的觀點(diǎn),這是一個(gè)進(jìn)步。谷歌前任人工智能公共政策高管蒂姆·黃指出,在互聯(lián)網(wǎng)時(shí)代初期,科技公司可能會(huì)說,他們“只不過是一個(gè)代表數(shù)據(jù)的平臺(tái)而已”。如今,“這一理念已經(jīng)沒有市場(chǎng)”。(財(cái)富中文網(wǎng)) 本文最初發(fā)表于《財(cái)富》雜志2018年7月1日刊。 譯者:馮豐 審校:夏林 |
WHEN TAY MADE HER DEBUT in March 2016, Microsoft had high hopes for the artificial intelligence–powered “social chatbot.” Like the automated, text-based chat programs that many people had already encountered on e-commerce sites and in customer service conversations, Tay could answer written questions; by doing so on Twitter and other social media, she could engage with the masses. But rather than simply doling out facts, Tay was engineered to converse in a more sophisticated way—one that had an emotional dimension. She would be able to show a sense of humor, to banter with people like a friend. Her creators had even engineered her to talk like a wisecracking teenage girl. When Twitter users asked Tay who her parents were, she might respond, “Oh a team of scientists in a Microsoft lab. They’re what u would call my parents.” If someone asked her how her day had been, she could quip, “omg totes exhausted.” Best of all, Tay was supposed to get better at speaking and responding as more people engaged with her. As her promotional material said, “The more you chat with Tay the smarter she gets, so the experience can be more personalized for you.” In low-stakes form, Tay was supposed to exhibit one of the most important features of true A.I.—the ability to get smarter, more effective, and more helpful over time. But nobody predicted the attack of the trolls. Realizing that Tay would learn and mimic speech from the people she engaged with, malicious pranksters across the web deluged her Twitter feed with racist, homophobic, and otherwise offensive comments. Within hours, Tay began spitting out her own vile lines on Twitter, in full public view. “Ricky gervais learned totalitarianism from adolf hitler, the inventor of atheism,” Tay said, in one tweet that convincingly imitated the defamatory, fake-news spirit of Twitter at its worst. Quiz her about then-president Obama, and she’d compare him to a monkey. Ask her about the Holocaust, and she’d deny it occurred. In less than a day, Tay’s rhetoric went from family-friendly to foulmouthed; fewer than 24 hours after her debut, Microsoft took her offline and apologized for the public debacle. What was just as striking was that the wrong turn caught Microsoft’s research arm off guard. “When the system went out there, we didn’t plan for how it was going to perform in the open world,” Microsoft’s managing director of research and artificial intelligence, Eric Horvitz, told Fortune in a recent interview. After Tay’s meltdown, Horvitz immediately asked his senior team working on “natural language processing”—the function central to Tay’s conversations—to figure out what went wrong. The staff quickly determined that basic best practices related to chatbots were overlooked. In programs that were more rudimentary than Tay, there were usually protocols that blacklisted offensive words, but there were no safeguards to limit the type of data Tay would absorb and build on. Today, Horvitz contends, he can “l(fā)ove the example” of Tay—a humbling moment that Microsoft could learn from. Microsoft now deploys far more sophisticated social chatbots around the world, including Ruuh in India, and Rinna in Japan and Indonesia. In the U.S., Tay has been succeeded by a social-bot sister, Zo. Some are now voice-based, the way Apple’s Siri or Amazon’s Alexa are. In China, a chatbot called Xiaoice is already “hosting” TV shows and sending chatty shopping tips to convenience store customers. Still, the company is treading carefully. It rolls the bots out slowly, Horvitz explains, and closely monitors how they are behaving with the public as they scale. But it’s sobering to realize that, even though A.I. tech has improved exponentially in the intervening two years, the work of policing the bots’ behavior never ends. The company’s staff constantly monitors the dialogue for any changes in its behavior. And those changes keep coming. In its early months, for example, Zo had to be tweaked and tweaked again after separate incidents in which it referred to Microsoft’s flagship Windows software as “spyware” and called the Koran, Islam’s foundational text, “very violent.” To be sure, Tay and Zo are not our future robot overlords. They’re relatively primitive programs occupying the parlor-trick end of the research spectrum, cartoon shadows of what A.I. can accomplish. But their flaws highlight both the power and the potential pitfalls of software imbued with even a sliver of artificial intelligence. And they exemplify more insidious dangers that are keeping technologists awake at night, even as the business world prepares to entrust ever more of its future to this revolutionary new technology. “You get your best practices in place, and hopefully those things will get more and more rare,” Horvitz says. With A.I. rising to the top of every company’s tech wish list, figuring out those practices has never been more urgent. FEW DISPUTE that we’re on the verge of a corporate A.I. gold rush. By 2021, research firm IDC predicts, organizations will spend $52.2 billion annually on A.I.-related products—and economists and analysts believe they’ll realize many billions more in savings and gains from that investment. Some of that bounty will come from the reduction in human headcount, but far more will come from enormous efficiencies in matching product to customer, drug to patient, solution to problem. Consultancy PwC estimates that A.I. could contribute up to $15.7 trillion to the global economy in 2030, more than the combined output of China and India today. The A.I. renaissance has been driven in part by advances in “deep-learning” technology. With deep learning, companies feed their computer networks enormous amounts of information so that they recognize patterns more quickly, and with less coaching (and eventually, perhaps, no coaching) from humans. Facebook, Google, Microsoft, Amazon, and IBM are among the giants already using deep-learning tech in their products. Apple’s Siri and Google Assistant, for example, recognize and respond to your voice because of deep learning. Amazon uses deep learning to help it visually screen tons of produce that it delivers via its grocery service. And in the near future, companies of every size hope to use deep-learning-powered software to mine their data and find gems buried too deep for meager human eyes to spot. They envision A.I.-driven systems that can scan thousands of radiology images to more quickly detect illnesses, or screen multitudes of résumés to save time for beleaguered human resources staff. In a technologist’s utopia, businesses could use A.I. to sift through years of data to better predict their next big sale, a pharmaceutical giant could cut down the time it takes to discover a blockbuster drug, or auto insurers could scan terabytes of car accidents and automate claims. But for all their enormous potential, A.I.-powered systems have a dark side. Their decisions are only as good as the data that humans feed them. As their builders are learning, the data used to train deep-learning systems isn’t neutral. It can easily reflect the biases—conscious and unconscious—of the people who assemble it. And sometimes data can be slanted by history, encoding trends and patterns that reflect centuries-old discrimination. A sophisticated algorithm can scan a historical database and conclude that white men are the most likely to succeed as CEOs; it can’t be programmed (yet) to recognize that, until very recently, people who weren’t white men seldom got the chance to be CEOs. Blindness to bias is a fundamental flaw in this technology, and while executives and engineers speak about it only in the most careful and diplomatic terms, there’s no doubt it’s high on their agenda. The most powerful algorithms being used today “haven’t been optimized for any definition of fairness,” says Deirdre Mulligan, an associate professor at the University of California at Berkeley who studies ethics in technology. “They have been optimized to do a task.” A.I. converts data into decisions with unprecedented speed—but what scientists and ethicists are learning, Mulligan says, is that in many cases “the data isn’t fair.” Adding to the conundrum is that deep learning is much more complex than the conventional algorithms that are its predecessors—making it trickier for even the most sophisticated programmers to understand exactly how an A.I. system makes any given choice. Like Tay, A.I. products can morph to behave in ways that its creators don’t intend and can’t anticipate. And because the creators and users of these systems religiously guard the privacy of their data and algorithms, citing competitive concerns about proprietary technology, it’s hard for external watchdogs to determine what problems could be embedded in any given system. The fact that tech that includes these black-box mysteries is being productized and pitched to companies and governments has more than a few researchers and activists deeply concerned. “These systems are not just off-the-shelf software that you can buy and say, ‘Oh, now I can do accounting at home,’ ” says Kate Crawford, principal researcher at Microsoft and codirector of the AI Now Institute at New York University. “These are very advanced systems that are going to be influencing our core social institutions.” THOUGH THEY MAY not think of it as such, most people are familiar with at least one A.I. breakdown: the spread of fake news on Facebook’s ubiquitous News Feed in the run-up to the 2016 U.S. presidential election. The social media giant and its data scientists didn’t create flat-out false stories. But the algorithms powering the News Feed weren’t designed to filter “false” from “true”; they were intended to promote content personalized to a user’s individual taste. While the company doesn’t disclose much about its algorithms (again, they’re proprietary), it has acknowledged that the calculus involves identifying stories that other users of similar tastes are reading and sharing. The result: Thanks to an endless series of what were essentially popularity contests, millions of people’s personal News Feeds were populated with fake news primarily because their peers liked it. While Facebook offers an example of how individual choices can interact toxically with A.I., researchers worry more about how deep learning could read, and misread, collective data. Timnit Gebru, a postdoctoral researcher who has studied the ethics of algorithms at Microsoft and elsewhere, says she’s concerned about how deep learning might affect the insurance market—a place where the interaction of A.I. and data could put minority groups at a disadvantage. Imagine, for example, a data set about auto accident claims. The data shows that accidents are more likely to take place in inner cities, where densely packed populations create more opportunities for fender benders. Inner cities also tend to have disproportionately high numbers of minorities among their residents. A deep-learning program, sifting through data in which these correlations were embedded, could “l(fā)earn” that there was a relationship between belonging to a minority and having car accidents, and could build that lesson into its assumptions about all drivers of color. In essence, that insurance A.I. would develop a racial bias. And that bias could get stronger if, for example, the system were to be further “trained” by reviewing photos and video from accidents in inner-city neighborhoods. In theory, the A.I. would become more likely to conclude that a minority driver is at fault in a crash involving multiple drivers. And it’s more likely to recommend charging a minority driver higher premiums, regardless of her record. It should be noted that insurers say they do not discriminate or assign rates based on race. But the inner-city hypothetical shows how data that seems neutral (facts about where car accidents happen) can be absorbed and interpreted by an A.I. system in ways that create new disadvantages (algorithms that charge higher prices to minorities, regardless of where they live, based on their race). What’s more, Gebru notes, given the layers upon layers of data that go into a deep-learning system’s decision-making, A.I.-enabled software could make decisions like this without engineers realizing how or why. “These are things we haven’t even thought about, because we are just starting to uncover biases in the most rudimentary algorithms,” she says. What distinguishes modern A.I.-powered software from earlier generations is that today’s systems “have the ability to make legally significant decisions on their own,” says Matt Scherer, a labor and employment lawyer at Littler Mendelson who specializes in A.I. The idea of not having a human in the loop to make the call about key outcomes alarmed Scherer when he started studying the field. If flawed data leads a deep-learning-powered X-ray to miss an overweight man’s tumor, is anyone responsible? “Is anyone looking at the legal implications of these things?” Scherer asks himself. AS BIG TECH PREPARES to embed deep-learning technology in commercial software for customers, questions like this are moving from the academic “what if?” realm to the front burner. In 2016, the year of the Tay misadventure, Microsoft created an internal group called Aether, which stands for AI and Ethics in Engineering and Research, chaired by Eric Horvitz. It’s a cross-disciplinary group, drawing representatives from engineering, research, policy, and legal teams, and machine-learning bias is one of its top areas of discussion. “Does Microsoft have a viewpoint on whether, for example, face-recognition software should be applied in sensitive areas like criminal justice and policing?” Horvitz muses, describing some of the topics the group is discussing. “Is the A.I. technology good enough to be used in this area, or will the failure rates be high enough where there has to be a sensitive, deep consideration for the costs of the failures? Joaquin Qui?onero Candela leads Facebook’s Applied Machine Learning group, which is responsible for creating the company’s A.I. technologies. Among many other functions, Facebook uses A.I. to weed spam out of people’s News Feeds. It also uses the technology to help serve stories and posts tailored to their interests—putting Candela’s team adjacent to the fake-news crisis. Candela calls A.I. “an accelerator of history,” in that the technology is “allowing us to build amazing tools that augment our ability to make decisions.” But as he acknowledges, “It is in decision-making that a lot of ethical questions come into play.” Facebook’s struggles with its News Feed show how difficult it can be to address ethical questions once an A.I. system is already powering a product. Microsoft was able to tweak a relatively simple system like Tay by adding profanities or racial epithets to a blacklist of terms that its algorithm should ignore. But such an approach wouldn’t work when trying to separate “false” from “true”—there are too many judgment calls involved. Facebook’s efforts to bring in human moderators to vet news stories—by, say, excluding articles from sources that frequently published verifiable falsehoods—exposed the company to charges of censorship. Today, one of Facebook’s proposed remedies is to simply show less news in the News Feed and instead highlight baby pictures and graduation photos—a winning-by-retreating approach. Therein lies the heart of the challenge: The dilemma for tech companies isn’t so much a matter of tweaking an algorithm or hiring humans to babysit it; rather, it’s about human nature itself. The real issue isn’t technical or even managerial—it’s philosophical. Deirdre Mulligan, the Berkeley ethics professor, notes that it’s difficult for computer scientists to codify fairness into software, given that fairness can mean different things to different people. Mulligan also points out that society’s conception of fairness can change over time. And when it comes to one widely shared ideal of fairness—namely, that everybody in a society ought to be represented in that society’s decisions—historical data is particularly likely to be flawed and incomplete. One of the Microsoft Aether group’s thought experiments illustrates the conundrum. It involves A.I. tech that sifts through a big corpus of job applicants to pick out the perfect candidate for a top executive position. Programmers could instruct the A.I. software to scan the characteristics of a company’s best performers. Depending on the company’s history, it might well turn out that all of the best performers—and certainly all the highest ranking executives—were white males. This might overlook the possibility that the company had a history of promoting only white men (for generations, most companies did), or has a culture in which minorities or women feel unwelcome and leave before they rise. Anyone who knows anything about corporate history would recognize these flaws—but most algorithms wouldn’t. If A.I. were to automate job recommendations, Horvitz says, there’s always a chance that it can “amplify biases in society that we may not be proud of.” FEI-FEI LI, the chief scientist for A.I. for Google’s cloud-computing unit, says that bias in technology “is as old as human civilization”—and can be found in a lowly pair of scissors. “For centuries, scissors were designed by right-handed people, used by mostly right-handed people,” she explains. “It took someone to recognize that bias and recognize the need to create scissors for lefthanded people.” Only about 10% of the world’s people are left-handed—and it’s human nature for members of the dominant majority to be oblivious to the experiences of other groups. That same dynamic, it turns out, is present in some of A.I.’s other most notable recent blunders. Consider the A.I.-powered beauty contest that Russian scientists conducted in 2016. Thousands of people worldwide submitted selfies for a contest in which computers would judge their beauty based on factors like the symmetry of their faces. But of the 44 winners the machines chose, only one had dark skin. An international ruckus ensued, and the contest’s operators later attributed the apparent bigotry of the computers on the fact that the data sets they used to train them did not contain many photos of people of color. The computers essentially ignored photos of people with dark skin and deemed those with lighter skin more “beautiful” because they represented the majority. This bias-through-omission turns out to be particularly pervasive in deep-learning systems in which image recognition is a major part of the training process. Joy Buolamwini, a researcher at the MIT Media Lab, recently collaborated with Gebru, the Microsoft researcher, on a paper studying gender-recognition technologies from Microsoft, IBM, and China’s Megvii. They found that the tech consistently made more accurate identifications of subjects with photos of lighter-skinned men than with those of darker-skinned women. Such algorithmic gaps may seem trivial in an online beauty contest, but Gebru points out that such technology can be used in much more high-stakes situations. “Imagine a selfdriving car that doesn’t recognize when it ‘sees’ black people,” Gebru says. “That could have dire consequences.” The Gebru-Buolamwini paper is making waves. Both Microsoft and IBM have said they have taken actions to improve their image-recognition technologies in response to the audit. While those two companies declined to be specific about the steps they were taking, other companies that are tackling the problem offer a glimpse of what tech can do to mitigate bias. When Amazon started deploying algorithms to weed out rotten fruit, it needed to work around a sampling-bias problem. Visual-recognition algorithms are typically trained to figure out what, say, strawberries are “supposed” to look like by studying a huge database of images. But pictures of rotten berries, as you might expect, are relatively rare compared with glamour shots of the good stuff. And unlike humans, whose brains tend to notice and react strongly to “outliers,” machine-learning algorithms tend to discount or ignore them. To adjust, explains Ralf Herbrich, Amazon’s director of artificial intelligence, the online retail giant is testing a computer science technique called oversampling. Machine-learning engineers can direct how the algorithm learns by assigning heavier statistical “weights” to underrepresented data, in this case the pictures of the rotting fruit. The result is that the algorithm ends up being trained to pay more attention to spoiled food than that food’s prevalence in the data library might suggest. Herbrich points out that oversampling can be applied to algorithms that study humans too (though he declined to cite specific examples of how Amazon does so). “Age, gender, race, nationality—they are all dimensions that you specifically have to test the sampling biases for in order to inform the algorithm over time,” Herbrich says. To make sure that an algorithm used to recognize faces in photos didn’t discriminate against or ignore people of color, or older people, or overweight people, you could add weight to photos of such individuals to make up for the shortage in your data set. Other engineers are focusing further “upstream”—making sure that the underlying data used to train algorithms is inclusive and free of bias, before it’s even deployed. In image recognition, for example, the millions of images used to train deep-learning systems need to be examined and labeled before they are fed to computers. Radha Basu, the CEO of data-training startup iMerit, whose clients include Getty Images and eBay, explains that the company’s staff of over 1,400 worldwide is trained to label photos on behalf of its customers in ways that can mitigate bias. Basu declined to discuss how that might play out when labeling people, but she offered other analogies. iMerit staff in India may consider a curry dish to be “mild,” while the company’s staff in New Orleans may describe the same meal as “spicy.” iMerit would make sure both terms appear in the label for a photo of that dish, because to label it as only one or the other would be to build an inaccuracy into the data. Assembling a data set about weddings, iMerit would include traditional Western white-dress-and-layer-cake images—but also shots from elaborate, more colorful weddings in India or Africa. iMerit’s staff stands out in a different way, Basu notes: It includes people with Ph.D.s, but also less-educated people who struggled with poverty, and 53% of the staff are women. The mix ensures that as many viewpoints as possible are involved in the data labeling process. “Good ethics does not just involve privacy and security,” Basu says. “It’s about bias, it’s about, Are we missing a viewpoint?” Tracking down that viewpoint is becoming part of more tech companies’ strategic agendas. Google, for example, announced in June that it would open an A.I. research center later this year in Accra, Ghana. “A.I. has great potential to positively impact the world, and more so if the world is well represented in the development of new A.I. technologies,” two Google engineers wrote in a blog post. A.I. insiders also believe they can fight bias by making their workforces in the U.S. more diverse—always a hurdle for Big Tech. Fei-Fei Li, the Google executive, recently cofounded the nonprofit AI4ALL to promote A.I. technologies and education among girls and women and in minority communities. The group’s activities include a summer program in which campers visit top university A.I. departments to develop relationships with mentors and role models. The bottom line, says AI4ALL executive director Tess Posner: “You are going to mitigate risks of bias if you have more diversity.” YEARS BEFORE this more diverse generation of A.I. researchers reaches the job market, however,big tech companies will have further imbued their products with deep-learning capabilities. And even as top researchers increasingly recognize the technology’s flaws—and acknowledge that they can’t predict how those flaws will play out—they argue that the potential benefits, social and financial, justify moving forward. “I think there’s a natural optimism about what technology can do,” says Candela, the Facebook executive. Almost any digital tech can be abused, he says, but adds, “I wouldn’t want to go back to the technology state we had in the 1950s and say, ‘No, let’s not deploy these things because they can be used wrong.’ ” Horvitz, the Microsoft research chief, says he’s confident that groups like his Aether team will help companies solve potential bias problems before they cause trouble in public. “I don’t think anybody’s rushing to ship things that aren’t ready to be used,” he says. If anything, he adds, he’s more concerned about “the ethical implications of not doing something.” He invokes the possibility that A.I. could reduce preventable medical error in hospitals. “You’re telling me you’d be worried that my system [showed] a little bit of bias once in a while?” Horvitz asks. “What are the ethics of not doing X when you could’ve solved a problem with X and saved many, many lives?” The watchdogs’ response boils down to: Show us your work. More transparency and openness about the data that goes into A.I.’s black-box systems will help researchers spot bias faster and solve problems more quickly. When an opaque algorithm could determine whether a person can get insurance, or whether that person goes to prison, says Buolamwini, the MIT researcher, “it’s really important that we are testing these systems rigorously, that there are some levels of transparency.” Indeed, it’s a sign of progress that few people still buy the idea that A.I. will be infallible. In the web’s early days, notes Tim Hwang, a former Google public policy executive for A.I. who now directs the Harvard-MIT Ethics and Governance of Artificial Intelligence initiative, technology companies could say they are “just a platform that represents the data.” Today, “society is no longer willing to accept that.” This article originally appeared in the July 1, 2018 issue of Fortune. |
-
熱讀文章
-
熱門視頻