研究人員稱,他們取得了一項重要的發現——可能對電腦、乃至人腦的研究都有著巨大深遠的影響。
位于舊金山的人工智能研究公司OpenAI表示,該公司掌握了一種更先進的方式,能夠用來一探“神經網絡”這種人工智能軟件的內部工作原理,從而有助于他們更好地了解“決策過程”這一著名的難題。他們在研究中發現,大型神經網絡中的單個神經元可以對某個特定的概念進行編碼——這與神經科學家在人腦中得到的發現驚人地相似。
“神經網絡”是一種對人腦的結構進行抽象、并以此建模而成的機器學習軟件。過去八年來,人工智能領域之所以能夠取得飛速的發展,主要就得益于神經網絡的廣泛應用——包括數字技術輔助的語音識別、面部識別軟件以及新的藥物研發方法等。
但大型神經網絡的一個缺點是,即便是對創造出它們的機器學習專家而言,理解其決策過程背后的原理也可能相當困難。而這帶來的結果就是,很難確切知曉該軟件何時會發生故障,又會如何發生故障。因而也就不難理解,為什么這類人工智能明明可以比其他自動化軟件或人工表現得更好,人們卻不太愿意使用它。這種情況在醫療和財務領域尤為普遍,因為一旦決策失誤,可能會付出金錢甚至生命的沉重代價。
OpenAI的聯合創始人及首席科學家伊利亞·莎士科爾說:“由于我們不了解這些神經網絡是如何工作的,因此當它們報錯時,很難推理出是哪個環節出了問題。我們不知道它們是否可靠,或者是不是有一些在測試中看不出來的隱藏漏洞。”
但最近,公司研究人員運用多項技術來探究他們創建的一個大型神經網絡的內部工作原理,該神經網絡是一個用來識別圖像并歸類存儲的軟件。他們的重要發現是,神經網絡中的單個神經元與某個特定的標簽或概念相關。
OpenAI在關于這項研究的一篇博文中說,這一點意義重大,因為它和2005年的一項具有里程碑意義的神經科學研究發現如出一轍——2005年的研究是針對人腦展開的,其發現是,人腦中可能存在一種所謂的“祖母”神經元,會對某個特定圖像或概念的刺激作出回應。例如,神經科學家發現,其中一個研究對象似乎有一種與女明星哈莉·貝瑞有關的神經元。當向其展示哈莉·貝瑞的圖像時,該神經元就會被觸發、產生反應,當這個人聽到“哈莉·貝瑞”一詞,或看到與哈莉·貝瑞有關的標志性圖像時,該神經元也會被激活。
OpenAI的研究對象是他們在今年1月首次推出的一項人工智能系統——該系統無需先對帶標簽的數據集進行分類識別訓練,就能夠高度精確地執行各種圖像歸類任務。這項被稱為CLIP(“語言-圖像比對預訓練”的縮寫)的系統從網上攝取了4億張圖像,并與文案匹配。根據這些信息,該系統學會了從32768個文本摘要標簽中預測,哪一個最可能與任意一個給定的圖像相關聯——即使此前從未見過這些圖像。例如,當向CLIP展示一碗鱷梨醬的圖片時,它不僅可以將其正確地標記為鱷梨醬,而且還知道鱷梨醬是“一種食物”。
OpenAI在新研究中使用的是一種“倒推”的方式,以探究是什么樣的圖片最能夠觸發某個人工神經元的反應,讓機器認為這是某個概念的“靈魂伴侶”。例如,OpenAI對與“金子”這一概念相關的神經元進行了探測,發現最可以觸發它的是含有閃亮的黃色硬幣狀物體和“金子”兩個字本身的圖片。而與“蜘蛛人”相關的神經元則會被一個漫畫人物的照片觸發,也會被“蜘蛛”這個詞觸發。有趣的是,一個與“黃色”概念相關的神經元會對“香蕉”和“檸檬”產生反應——包括這兩個詞語本身和它們的顏色。
OpenAI的研究人員加布里埃爾·高告訴《財富》雜志:“這或許證明了,這些神經網絡并不像我們想象的那樣令人費解。”將來,此類方法能夠幫助那些使用了神經網絡技術的公司,讓他們了解這些機器是如何做出決策的,以及系統可能在何時出現故障或偏差。這還可以啟示神經科學家們另辟蹊徑:用“人工”神經網絡的原理,來研究“人類”的學習和概念形成方式。
并非每個CLIP神經元都只與一個特定的概念相關聯——很多會被不同概念激活并產生回應。而且有些神經元似乎能夠被同時觸發,這或許意味著與之有關的是一個復雜的概念。
OpenAI表示,也會有這樣一些概念,即研究人員原本期望系統中有與之相關的神經元,但其實并不存在。例如盡管CLIP可以準確地識別舊金山的照片,甚至通常能夠識別出拍攝照片的具體地點,但神經網絡似乎并沒有與“舊金山”甚至“城市”、“加利福尼亞”等概念本身相關的神經元。OpenAI在其博文中說:“我們認為,這些信息也被編碼在該模型的某個觸發系統中,但是以一種更獨特的方式編碼的。”
這種技術還可以發現神經網絡系統中潛藏的偏見。在演示中,研究人員發現,CLIP還具有OpenAI所稱的一種“中東”神經元。這種神經元不僅會被與中東地區相關的圖像和單詞觸發,并產生反應,也會和“恐怖分子”概念關聯。還有一個“移民”神經元,與“拉丁美洲”關聯。還有能夠同時對深膚色人種和大猩猩產生反應的神經元——OpenAI指出,這種偏見和其他一些帶有種族主義色彩的圖片標簽類似。此前,谷歌(Google)基于神經網絡技術的圖像分類系統就出現過此類問題。
大型人工智能系統中潛在的種族、性別偏見已經引起了倫理學家和社會組織的日益關注,特別是那些通過互聯網大數據進行機器學習、訓練的人工智能系統。
研究人員還說,他們的方法發現了CLIP在決策過程中存在的一項特殊偏見,這可能被有心人利用,故意誤導人工智能系統做出錯誤的識別。該系統對表示某個概念的語詞或符號文本和那個概念本身緊密關聯,以至于如果有人把這種符號或語詞放在不同的對象上,系統就會將其分錯。例如,身上帶有巨大“$$$”標志的狗可能就會被錯誤地識別成“存錢罐”。
加布里埃爾·高說:“我認為,系統中肯定會帶有很多刻板印象。”而莎士科爾表示,能夠識別出這些偏差,正是試著糾正它們的第一步。他認為,可以讓神經網絡進行一點附加的樣例訓練,這些訓練專門用來打破機器在學習中習得的不恰當的相關性。(財富中文網)
編譯:陳聰聰
研究人員稱,他們取得了一項重要的發現——可能對電腦、乃至人腦的研究都有著巨大深遠的影響。
位于舊金山的人工智能研究公司OpenAI表示,該公司掌握了一種更先進的方式,能夠用來一探“神經網絡”這種人工智能軟件的內部工作原理,從而有助于他們更好地了解“決策過程”這一著名的難題。他們在研究中發現,大型神經網絡中的單個神經元可以對某個特定的概念進行編碼——這與神經科學家在人腦中得到的發現驚人地相似。
“神經網絡”是一種對人腦的結構進行抽象、并以此建模而成的機器學習軟件。過去八年來,人工智能領域之所以能夠取得飛速的發展,主要就得益于神經網絡的廣泛應用——包括數字技術輔助的語音識別、面部識別軟件以及新的藥物研發方法等。
但大型神經網絡的一個缺點是,即便是對創造出它們的機器學習專家而言,理解其決策過程背后的原理也可能相當困難。而這帶來的結果就是,很難確切知曉該軟件何時會發生故障,又會如何發生故障。因而也就不難理解,為什么這類人工智能明明可以比其他自動化軟件或人工表現得更好,人們卻不太愿意使用它。這種情況在醫療和財務領域尤為普遍,因為一旦決策失誤,可能會付出金錢甚至生命的沉重代價。
OpenAI的聯合創始人及首席科學家伊利亞·莎士科爾說:“由于我們不了解這些神經網絡是如何工作的,因此當它們報錯時,很難推理出是哪個環節出了問題。我們不知道它們是否可靠,或者是不是有一些在測試中看不出來的隱藏漏洞。”
但最近,公司研究人員運用多項技術來探究他們創建的一個大型神經網絡的內部工作原理,該神經網絡是一個用來識別圖像并歸類存儲的軟件。他們的重要發現是,神經網絡中的單個神經元與某個特定的標簽或概念相關。
OpenAI在關于這項研究的一篇博文中說,這一點意義重大,因為它和2005年的一項具有里程碑意義的神經科學研究發現如出一轍——2005年的研究是針對人腦展開的,其發現是,人腦中可能存在一種所謂的“祖母”神經元,會對某個特定圖像或概念的刺激作出回應。例如,神經科學家發現,其中一個研究對象似乎有一種與女明星哈莉·貝瑞有關的神經元。當向其展示哈莉·貝瑞的圖像時,該神經元就會被觸發、產生反應,當這個人聽到“哈莉·貝瑞”一詞,或看到與哈莉·貝瑞有關的標志性圖像時,該神經元也會被激活。
OpenAI的研究對象是他們在今年1月首次推出的一項人工智能系統——該系統無需先對帶標簽的數據集進行分類識別訓練,就能夠高度精確地執行各種圖像歸類任務。這項被稱為CLIP(“語言-圖像比對預訓練”的縮寫)的系統從網上攝取了4億張圖像,并與文案匹配。根據這些信息,該系統學會了從32768個文本摘要標簽中預測,哪一個最可能與任意一個給定的圖像相關聯——即使此前從未見過這些圖像。例如,當向CLIP展示一碗鱷梨醬的圖片時,它不僅可以將其正確地標記為鱷梨醬,而且還知道鱷梨醬是“一種食物”。
OpenAI在新研究中使用的是一種“倒推”的方式,以探究是什么樣的圖片最能夠觸發某個人工神經元的反應,讓機器認為這是某個概念的“靈魂伴侶”。例如,OpenAI對與“金子”這一概念相關的神經元進行了探測,發現最可以觸發它的是含有閃亮的黃色硬幣狀物體和“金子”兩個字本身的圖片。而與“蜘蛛人”相關的神經元則會被一個漫畫人物的照片觸發,也會被“蜘蛛”這個詞觸發。有趣的是,一個與“黃色”概念相關的神經元會對“香蕉”和“檸檬”產生反應——包括這兩個詞語本身和它們的顏色。
OpenAI的研究人員加布里埃爾·高告訴《財富》雜志:“這或許證明了,這些神經網絡并不像我們想象的那樣令人費解。”將來,此類方法能夠幫助那些使用了神經網絡技術的公司,讓他們了解這些機器是如何做出決策的,以及系統可能在何時出現故障或偏差。這還可以啟示神經科學家們另辟蹊徑:用“人工”神經網絡的原理,來研究“人類”的學習和概念形成方式。
并非每個CLIP神經元都只與一個特定的概念相關聯——很多會被不同概念激活并產生回應。而且有些神經元似乎能夠被同時觸發,這或許意味著與之有關的是一個復雜的概念。
OpenAI表示,也會有這樣一些概念,即研究人員原本期望系統中有與之相關的神經元,但其實并不存在。例如盡管CLIP可以準確地識別舊金山的照片,甚至通常能夠識別出拍攝照片的具體地點,但神經網絡似乎并沒有與“舊金山”甚至“城市”、“加利福尼亞”等概念本身相關的神經元。OpenAI在其博文中說:“我們認為,這些信息也被編碼在該模型的某個觸發系統中,但是以一種更獨特的方式編碼的。”
這種技術還可以發現神經網絡系統中潛藏的偏見。在演示中,研究人員發現,CLIP還具有OpenAI所稱的一種“中東”神經元。這種神經元不僅會被與中東地區相關的圖像和單詞觸發,并產生反應,也會和“恐怖分子”概念關聯。還有一個“移民”神經元,與“拉丁美洲”關聯。還有能夠同時對深膚色人種和大猩猩產生反應的神經元——OpenAI指出,這種偏見和其他一些帶有種族主義色彩的圖片標簽類似。此前,谷歌(Google)基于神經網絡技術的圖像分類系統就出現過此類問題。
大型人工智能系統中潛在的種族、性別偏見已經引起了倫理學家和社會組織的日益關注,特別是那些通過互聯網大數據進行機器學習、訓練的人工智能系統。
研究人員還說,他們的方法發現了CLIP在決策過程中存在的一項特殊偏見,這可能被有心人利用,故意誤導人工智能系統做出錯誤的識別。該系統對表示某個概念的語詞或符號文本和那個概念本身緊密關聯,以至于如果有人把這種符號或語詞放在不同的對象上,系統就會將其分錯。例如,身上帶有巨大“$$$”標志的狗可能就會被錯誤地識別成“存錢罐”。
加布里埃爾·高說:“我認為,系統中肯定會帶有很多刻板印象。”而莎士科爾表示,能夠識別出這些偏差,正是試著糾正它們的第一步。他認為,可以讓神經網絡進行一點附加的樣例訓練,這些訓練專門用來打破機器在學習中習得的不恰當的相關性。(財富中文網)
編譯:陳聰聰
Researchers say they have made an important finding that could have big implications for the study of computer brains and, possibly, human ones too.
OpenAI, the San Francisco–based A.I. research company, says that it has advanced methods for peering into the inner workings of artificial intelligence software known as neural networks, helping to make their notoriously opaque decision-making more interpretable. In the process they have uncovered that individual neurons in a large neural network can encode a particular concept, a finding that parallels one that neuroscientists have glimpsed in the human brain.
Neural networks are a kind of machine-learning software loosely modeled on the human brain. The use of these networks has been responsible for most of the rapid advances in artificial intelligence in the past eight years, including the speech recognition found in digital assistants, facial recognition software, and new ways to discover drugs.
But one drawback in large neural networks is that it can be challenging to understand the rationale behind their decisions, even for the machine-learning experts who create them. As a result, it is difficult to know exactly when and how this software can fail. And that has made people understandably reluctant to use such A.I. software, even when these A.I. systems seem to outperform other kinds of automated software or humans. This has particularly been true in medical and financial settings, where a wrong decision may cost money or even lives.
“Because we don’t understand how these neural networks work, it can be hard to reason about their errors,” says Ilya Sutskever, OpenAI’s cofounder and chief scientist. “We don’t know if they are reliable or if they have hidden vulnerabilities that are not apparent from testing.”
But researchers at the company recently used several techniques to probe the inner workings of a large neural network they had created for identifying images and putting them into broad category buckets. The researchers discovered that individual neurons in the network were associated with one particular label or concept.
This was significant, OpenAI said in a blog post discussing its research, because it echoed findings from a landmark 2005 neuroscience study that found the human brain may have “grandmother” neurons that fire in response to one very specific image or concept. For instance, the neuroscientists discovered that one subject in their study seemed to have a neuron that was associated with the actress Halle Berry. The neuron fired when the person was shown an image of Berry, but the same neuron was also activated when the person heard the words “Halle Berry,” or when shown images associated with Berry’s iconic roles.
OpenAI’s research focused on an A.I. system it debuted in January that can perform a wide variety of image classification tasks with a high degree of accuracy, without being specifically trained for those tasks with labeled data sets. The system, called CLIP (short for Contrastive Language-Image Pre-training), ingested 400 million images from the Internet and paired with captions. From this information, the technology learned to predict which of 32,768 text snippet labels was most likely to be associated with any given image, even those it had never encountered before. For instance, show CLIP a picture of a bowl of guacamole, and not only is it able to correctly label the image as guacamole but it also knows that guacamole is “a type of food.”
In the new research, OpenAI used techniques that reverse engineer what makes a particular artificial neuron fire the most to build up a picture of that neuron’s “Platonic ideal” for a given concept. For instance, OpenAI probed one neuron associated with the concept “gold” and found that the image that most activated it would contain shiny yellow coin-like objects as well as a picture of the text “gold” itself. A neuron affiliated with “spider man” was triggered in response to photos of a person dressed up as the comic book hero but also to the word “spider.” Interestingly, a neuron affiliated with the concept “yellow” fired in response to the words “banana” and “lemon,” as well as the color and word itself.
“This is maybe evidence that these neural networks are not as incomprehensible as we might think,” Gabriel Goh, the OpenAI researcher who led the team working on interpreting CLIP’s conceptual reasoning, told Fortune. In the future, such methods could be used to help companies using neural networks to understand how they arrive at decisions and when a system is likely to fail or exhibit bias. It might also point a way for neuroscientists to use artificial neural networks to investigate the ways in which human learning and concept formation may take place.
Not every CLIP neuron was associated with a distinct concept. Many fired in response to a number of different conceptual categories. And some neurons seemed to fire together, possibly meaning that they represented a complex concept.
OpenAI said that some concepts that the researchers expected the system to have a neuron for were absent. Even though CLIP can accurately identify photographs of San Francisco and can often even identify the neighborhood of the city in which they were taken, the neural network did not seem to have a neuron associated with a concept of “San Francisco” or even “city” or “California.” “We believe this information to be encoded within the activations of the model somewhere, but in a more exotic way,” OpenAI said in its blog post.
In a demonstration that this technique can be used to uncover hidden biases in neural networks, the researchers discovered that CLIP also had what OpenAI dubbed a “Middle East” neuron that fired in response to images and words associated with the region, but also in response to those associated with terrorism, the company said. It had an “immigration” neuron that responds to Latin America. And it found a neuron that fired for both dark-skinned people and gorillas, which OpenAI noted was similar to other racist photo tagging that had previously caused problems for neural network–based image classification systems at Google.
Racial and gender biases hidden in large A.I. models have become an increasing area of concern for A.I. ethics researchers and civil society organizations, especially those trained from massive amount of data culled from the Internet.
The researchers also said that their methods had uncovered a particular bias in how CLIP makes decisions that would make it possible for someone to fool the A.I. into making incorrect identifications. The system associated the text of a word or symbol associated with a concept so strongly that if someone put that symbol or word on a different object, the system would misclassify it. For instance, a dog with a big “$$$” sign on it might be misclassified as a piggy bank.
“I think you definitely see a lot of stereotyping in the model,” Goh said. Sutskever said that being able to identify these biases was a first step toward trying to correct them, something he thought could be accomplished by providing the neural network with a relatively small number of additional training examples that are specifically designed to break the inappropriate correlation the system has learned.