精品国产_亚洲人成在线高清,国产精品成人久久久久,国语自产偷拍精品视频偷拍

首頁(yè) 500強(qiáng) 活動(dòng) 榜單 商業(yè) 科技 領(lǐng)導(dǎo)力 專題 品牌中心
雜志訂閱

OpenAI的首席科學(xué)家表示,有了GPT-4,該公司就有了“實(shí)現(xiàn)魔法的配方”

Jeremy Kahn
2023-03-17

GPT-4正式發(fā)布,這是關(guān)注人工智能發(fā)展的人士期待已久的。

文本設(shè)置
小號(hào)
默認(rèn)
大號(hào)
Plus(0條)

圖片來(lái)源:AKUB PORZYCKI/NURPHOTO VIA GETTY IMAGES

GPT-4終于亮相了。這是OpenAI的最新版人工智能系統(tǒng),堪稱史上最強(qiáng)大的人工智能系統(tǒng),也是廣受歡迎的ChatGPT的人工智能模型的繼任者。

位于美國(guó)舊金山的人工智能實(shí)驗(yàn)室OpenAI現(xiàn)在與微軟(Microsoft)開(kāi)展密切合作,該實(shí)驗(yàn)室稱,GPT-4比支持消費(fèi)者版本ChatGPT的GPT-3.5模型更強(qiáng)大。首先,GPT-4是多模態(tài)的:盡管它只生成文本,但它可以接收?qǐng)D像和文本。這有助于提升人工智能模型“理解”照片和場(chǎng)景的能力。(盡管目前提供這種視覺(jué)理解功能僅依靠OpenAI與Be My Eyes的合作,后者是一款面向視障人士的免費(fèi)移動(dòng)應(yīng)用程序。)

在一系列自然語(yǔ)言處理和計(jì)算機(jī)視覺(jué)算法的基準(zhǔn)測(cè)試中,新模型的表現(xiàn)比GPT-3.5要好得多。它在一系列原本為人類設(shè)計(jì)的考試評(píng)估中也表現(xiàn)得相當(dāng)出色,包括在模擬律師資格考試中取得了非常優(yōu)異的成績(jī),而且還在從數(shù)學(xué)到藝術(shù)史的一系列大學(xué)先修考試中取得了滿分(滿分是5分)。(有趣的是,該系統(tǒng)在大學(xué)先修課程英語(yǔ)文學(xué)和英語(yǔ)作文考試中得分都很低,機(jī)器學(xué)習(xí)專家已經(jīng)提出疑問(wèn),即GPT-4在考試中的出色表現(xiàn)是否可能不如表面上看起來(lái)那么驚艷。)

根據(jù)OpenAI的說(shuō)法,該模型針對(duì)提問(wèn)返回基于事實(shí)的答案的可能性增加了40%,盡管在某些情況下,它仍然可能編造信息,人工智能研究人員稱之為“幻覺(jué)”。它也不太可能跳過(guò)OpenAI為該模型設(shè)置的護(hù)欄,這些護(hù)欄是為了防止它輸出有毒或有偏見(jiàn)的言論,或建議用戶采取可能造成傷害的行動(dòng)。OpenAI表示,GPT-4比GPT-3.5更有可能拒絕此類請(qǐng)求。

盡管如此,GPT-4仍然有許多與其他大型語(yǔ)言模型相同的潛在風(fēng)險(xiǎn)和缺陷。它并不完全可靠。它的答案是不可預(yù)測(cè)的。它能夠用來(lái)生成錯(cuò)誤信息。它仍然可能跳過(guò)護(hù)欄,輸出危險(xiǎn)的答案,這要么是因?yàn)樗赡軐?duì)閱讀輸出的人造成傷害,要么是因?yàn)樗赡軙?huì)鼓勵(lì)人們采取傷害自己或他人的行動(dòng)。例如,它可以被用來(lái)幫助某人找到用家用產(chǎn)品制造簡(jiǎn)易化學(xué)武器或爆炸物的方法。

正因如此,OpenAI提醒用戶“使用語(yǔ)言模型時(shí)應(yīng)該謹(jǐn)慎審查輸出內(nèi)容,特別是在高風(fēng)險(xiǎn)情況下,必要時(shí)使用與特定用例需求相匹配的確切協(xié)議(比如人工審查、附加上下文或完全避免在高風(fēng)險(xiǎn)情境中使用)。”然而,OpenAI已經(jīng)正式發(fā)布該模型,并將該模型提供給ChatGPT Plus的付費(fèi)用戶,該模型也將作為基于云的應(yīng)用程序編程接口(API)提供給企業(yè)。

GPT-4正式發(fā)布,這是關(guān)注人工智能發(fā)展的人士期待已久的。當(dāng)OpenAI在2022年11月下旬發(fā)布ChatGPT時(shí),幾乎所有人都大吃一驚,但至少在一年前,OpenAI正在研發(fā)GPT-4的事情就已經(jīng)廣為人知了,盡管人們一直在猜測(cè)它究竟會(huì)是什么。事實(shí)上,在ChatGPT出乎意料的爆火引發(fā)轟動(dòng)之后,人工智能炒作甚囂塵上,OpenAI的首席執(zhí)行官薩姆·奧爾特曼認(rèn)為有必要盡力為GPT-4即將發(fā)布的期望潑冷水。“GPT-4謠言四起是一件可笑的事情。我不知道這一切從何而來(lái)。”奧爾特曼于今年1月在舊金山的一次活動(dòng)中接受采訪時(shí)說(shuō)道。在提到通用人工智能(AGI)的概念時(shí),他表示,這種超級(jí)智能機(jī)器一直是科幻小說(shuō)的熱門題材,“人們的設(shè)想太美好了,他們會(huì)失望的。他們對(duì)我們寄予厚望,希望我們能夠研發(fā)出真正的通用人工智能,但現(xiàn)實(shí)是,我們沒(méi)有研發(fā)出真正的通用人工智能。”

3月15日,我與幾位幫助構(gòu)建GPT-4的OpenAI研究人員談?wù)摿怂墓δ堋⒕窒扌砸约八麄兪侨绾螛?gòu)建它的。研究人員簡(jiǎn)單介紹了他們使用的方法,但他們有很多保密信息,包括模型的大小、用于訓(xùn)練的數(shù)據(jù)究竟是什么、訓(xùn)練和運(yùn)行它需要多少專用計(jì)算機(jī)芯片(圖形處理單元)、它的碳足跡等等。

OpenAI首席執(zhí)行官薩姆·奧爾特曼。圖片來(lái)源:OVELLE TAMAYO/ FORTHE WASHINGTON POST VIA GETTY IMAGES

OpenAI是由埃隆·馬斯克聯(lián)合創(chuàng)立的。馬斯克表示,他之所以選擇這個(gè)名字,是因?yàn)樗M@個(gè)新的研究實(shí)驗(yàn)室能夠致力于實(shí)現(xiàn)人工智能民主化和透明化,并公布所有研究成果。多年來(lái),OpenAI逐漸放棄了其創(chuàng)建之初關(guān)于透明度的承諾,由于關(guān)于GPT-4的細(xì)節(jié)公布很少,一些計(jì)算機(jī)科學(xué)家打趣說(shuō),該實(shí)驗(yàn)室應(yīng)該改名。Nomic AI公司的設(shè)計(jì)副總裁本·施密特在推特(Twitter)上說(shuō):“我認(rèn)為這一做法關(guān)閉了‘Open’AI 的大門。他們?cè)诮榻B GPT-4 的 98 頁(yè)論文中自豪地宣稱,他們‘沒(méi)有’透露任何關(guān)于訓(xùn)練集內(nèi)容的信息。”

OpenAI的首席科學(xué)家伊利亞·薩茨科弗告訴《財(cái)富》雜志,保密的主要原因是“這是一個(gè)競(jìng)爭(zhēng)非常激烈的環(huán)境”,該公司不希望商業(yè)對(duì)手迅速?gòu)?fù)制他們的成果。他還表示,在未來(lái),隨著人工智能模型變得更加強(qiáng)大,而“這些功能很容易造成巨大傷害”,出于安全考慮,限制透露有關(guān)這些模型如何創(chuàng)建的信息將非常重要。

有時(shí),薩茨科弗在談到GPT-4時(shí),似乎故意回避對(duì)其內(nèi)部工作原理的嚴(yán)肅討論。在討論創(chuàng)建生成式預(yù)訓(xùn)練轉(zhuǎn)化器(或稱GPT)的高級(jí)流程時(shí),他描述了一個(gè)“實(shí)現(xiàn)魔法的配方”,生成式預(yù)訓(xùn)練轉(zhuǎn)化器是支撐大多數(shù)大型語(yǔ)言模型的基本模型架構(gòu)。薩茨科弗說(shuō):“GPT-4是這種魔法的最新表現(xiàn)形式。”在回答關(guān)于OpenAI是如何設(shè)法減少GPT-4產(chǎn)生幻覺(jué)的傾向的問(wèn)題時(shí),薩茨科弗表示:“我們只是教它不要產(chǎn)生幻覺(jué)。”

為了安全性和易用性,進(jìn)行了6個(gè)月的微調(diào)

薩茨科弗在OpenAI的兩位同事提供了更多關(guān)于OpenAI如何“教它不要產(chǎn)生幻覺(jué)”的細(xì)節(jié)。OpenAI的技術(shù)人員雅各布·帕喬基指出,光是更大模型加持,以及在預(yù)訓(xùn)練期間增大學(xué)習(xí)的數(shù)據(jù)量,似乎就是其準(zhǔn)確性提高的部分原因。瑞安·洛是OpenAI負(fù)責(zé)“對(duì)齊”工作的團(tuán)隊(duì)的共同負(fù)責(zé)人,即負(fù)責(zé)確保人工智能系統(tǒng)只完成人類希望它完成的工作,而且不做我們不希望它做的事情。他說(shuō),在對(duì)GPT-4進(jìn)行預(yù)訓(xùn)練后,OpenAI還花了大約6個(gè)月的時(shí)間對(duì)模型進(jìn)行了微調(diào),使其既安全又易于使用。他表示,它使用的一種方法是收集人類對(duì)GPT-4輸出結(jié)果的反饋,然后利用這些反饋推動(dòng)模型生成它預(yù)測(cè)更有可能從這些人類審查員那里得到積極反饋的回答。這個(gè)過(guò)程被稱為“從人類反饋中強(qiáng)化學(xué)習(xí)”,是使ChatGPT成為如此吸引人且大有用處的聊天機(jī)器人的部分原因。

洛指出,一些用于改進(jìn)GPT-4的反饋來(lái)自ChatGPT用戶的體驗(yàn),這表明,在許多競(jìng)爭(zhēng)對(duì)手推出他們的系統(tǒng)之前,讓數(shù)億人使用該聊天機(jī)器人,可能為OpenAI創(chuàng)造了一個(gè)旋轉(zhuǎn)更快的“數(shù)據(jù)飛輪”,讓該公司在構(gòu)建未來(lái)先進(jìn)的人工智能軟件方面更具優(yōu)勢(shì),競(jìng)爭(zhēng)對(duì)手可能很難與之匹敵。

洛說(shuō),OpenAI專門用更多給出正確答案的例子來(lái)訓(xùn)練GPT-4,以提高模型執(zhí)行該任務(wù)的能力,并降低它產(chǎn)生幻覺(jué)的幾率。他還表示,OpenAI使用GPT-4來(lái)生成模擬對(duì)話和其他數(shù)據(jù),然后反饋給GPT-4進(jìn)行微調(diào),以幫助它減少幻覺(jué)。這是“數(shù)據(jù)飛輪”發(fā)揮作用的另一個(gè)例子。

“魔法”是否足夠可靠,可以面向大眾正式發(fā)布呢?

薩茨科弗為OpenAI發(fā)布GPT-4的決定進(jìn)行了辯護(hù),盡管它存在局限性和風(fēng)險(xiǎn)。他說(shuō):“好吧,這個(gè)模型是有缺陷的,但有多大的缺陷呢?目前該模型還配置了安全緩解措施。”他還解釋說(shuō)OpenAI認(rèn)為這些護(hù)欄和安全措施足夠有效,可以允許該公司發(fā)布該模型。薩茨科弗還指出,OpenAI的使用條款和條件禁止惡意使用該模型,如今,該公司的監(jiān)控程序已經(jīng)就位,試圖檢查用戶是否違反了這些條款。他說(shuō),結(jié)合GPT-4在幻覺(jué)等關(guān)鍵指標(biāo)上表現(xiàn)出的更好的安全性,以及它能夠拒絕“越獄”或跳過(guò)護(hù)欄的請(qǐng)求,“讓我們覺(jué)得繼續(xù)發(fā)布GPT-4是合適的,就像我們目前正在做的那樣。”

在為《財(cái)富》雜志的工作人員進(jìn)行的演示中,OpenAI的研究人員要求該系統(tǒng)寫(xiě)一篇關(guān)于自身的總結(jié)性文章,但只使用以字母“G”開(kāi)頭的單詞——GPT-4的行文相對(duì)連貫。薩茨科弗說(shuō)GPT-3.5可能會(huì)搞砸這個(gè)任務(wù),因?yàn)樗褂昧艘恍┎皇且浴癎”開(kāi)頭的單詞。在另一個(gè)例子中,演示人員向GPT-4展示了美國(guó)稅法的部分條例,然后給出了一個(gè)關(guān)于一對(duì)特定夫婦的場(chǎng)景,并要求GPT-4參照剛剛看過(guò)的法規(guī)條文計(jì)算他們應(yīng)該繳納的稅款。GPT-4似乎在大約一秒鐘內(nèi)就得出了正確的稅額。(雖然我未能回過(guò)頭來(lái)仔細(xì)檢查它給出的答案。)

盡管演示令人印象深刻,但一些人工智能研究人員和技術(shù)專家表示,像GPT-4這樣的系統(tǒng)對(duì)于許多企業(yè)用例來(lái)說(shuō)仍然不夠可靠,特別是在信息檢索方面,因?yàn)镚PT-4還是有可能出現(xiàn)幻覺(jué)。如果用戶向它提問(wèn),但該用戶并不知道答案,那么在這種情況下,可能就仍然不適合使用GPT-4。創(chuàng)建數(shù)據(jù)編目和開(kāi)發(fā)檢索軟件的軟件公司Alation的聯(lián)合創(chuàng)始人及首席戰(zhàn)略官阿龍·卡爾布表示:“即使幻覺(jué)發(fā)生率下降了,但如果幻覺(jué)發(fā)生率沒(méi)有達(dá)到無(wú)限小,或者至少像人類專家分析師那樣小的情況下,可能就仍然不適合使用GPT-4。”

卡爾布還稱,即便提示模型只從特定的數(shù)據(jù)集生成答案,或者只使用模型總結(jié)通過(guò)傳統(tǒng)搜索算法搜索出的信息,也可能不足以確保模型沒(méi)有編造部分答案,也不足以確保模型不會(huì)給出在預(yù)訓(xùn)練期間學(xué)習(xí)的不準(zhǔn)確的或過(guò)時(shí)的信息。

卡爾布指出,使用大型語(yǔ)言模型是否合適,將取決于用例,以及由人類來(lái)審查人工智能給出的答案是否現(xiàn)實(shí)可行。他說(shuō),要求GPT-4生成營(yíng)銷文案,在這種情況下,文案將由人類進(jìn)行審查和編輯,這可能是可行的。但在人類不可能對(duì)模型生成的所有內(nèi)容進(jìn)行事實(shí)核查的情況下,依賴GPT-4給出的答案可能是危險(xiǎn)的。(財(cái)富中文網(wǎng))

譯者:中慧言-王芳

GPT-4終于亮相了。這是OpenAI的最新版人工智能系統(tǒng),堪稱史上最強(qiáng)大的人工智能系統(tǒng),也是廣受歡迎的ChatGPT的人工智能模型的繼任者。

位于美國(guó)舊金山的人工智能實(shí)驗(yàn)室OpenAI現(xiàn)在與微軟(Microsoft)開(kāi)展密切合作,該實(shí)驗(yàn)室稱,GPT-4比支持消費(fèi)者版本ChatGPT的GPT-3.5模型更強(qiáng)大。首先,GPT-4是多模態(tài)的:盡管它只生成文本,但它可以接收?qǐng)D像和文本。這有助于提升人工智能模型“理解”照片和場(chǎng)景的能力。(盡管目前提供這種視覺(jué)理解功能僅依靠OpenAI與Be My Eyes的合作,后者是一款面向視障人士的免費(fèi)移動(dòng)應(yīng)用程序。)

在一系列自然語(yǔ)言處理和計(jì)算機(jī)視覺(jué)算法的基準(zhǔn)測(cè)試中,新模型的表現(xiàn)比GPT-3.5要好得多。它在一系列原本為人類設(shè)計(jì)的考試評(píng)估中也表現(xiàn)得相當(dāng)出色,包括在模擬律師資格考試中取得了非常優(yōu)異的成績(jī),而且還在從數(shù)學(xué)到藝術(shù)史的一系列大學(xué)先修考試中取得了滿分(滿分是5分)。(有趣的是,該系統(tǒng)在大學(xué)先修課程英語(yǔ)文學(xué)和英語(yǔ)作文考試中得分都很低,機(jī)器學(xué)習(xí)專家已經(jīng)提出疑問(wèn),即GPT-4在考試中的出色表現(xiàn)是否可能不如表面上看起來(lái)那么驚艷。)

根據(jù)OpenAI的說(shuō)法,該模型針對(duì)提問(wèn)返回基于事實(shí)的答案的可能性增加了40%,盡管在某些情況下,它仍然可能編造信息,人工智能研究人員稱之為“幻覺(jué)”。它也不太可能跳過(guò)OpenAI為該模型設(shè)置的護(hù)欄,這些護(hù)欄是為了防止它輸出有毒或有偏見(jiàn)的言論,或建議用戶采取可能造成傷害的行動(dòng)。OpenAI表示,GPT-4比GPT-3.5更有可能拒絕此類請(qǐng)求。

盡管如此,GPT-4仍然有許多與其他大型語(yǔ)言模型相同的潛在風(fēng)險(xiǎn)和缺陷。它并不完全可靠。它的答案是不可預(yù)測(cè)的。它能夠用來(lái)生成錯(cuò)誤信息。它仍然可能跳過(guò)護(hù)欄,輸出危險(xiǎn)的答案,這要么是因?yàn)樗赡軐?duì)閱讀輸出的人造成傷害,要么是因?yàn)樗赡軙?huì)鼓勵(lì)人們采取傷害自己或他人的行動(dòng)。例如,它可以被用來(lái)幫助某人找到用家用產(chǎn)品制造簡(jiǎn)易化學(xué)武器或爆炸物的方法。

正因如此,OpenAI提醒用戶“使用語(yǔ)言模型時(shí)應(yīng)該謹(jǐn)慎審查輸出內(nèi)容,特別是在高風(fēng)險(xiǎn)情況下,必要時(shí)使用與特定用例需求相匹配的確切協(xié)議(比如人工審查、附加上下文或完全避免在高風(fēng)險(xiǎn)情境中使用)。”然而,OpenAI已經(jīng)正式發(fā)布該模型,并將該模型提供給ChatGPT Plus的付費(fèi)用戶,該模型也將作為基于云的應(yīng)用程序編程接口(API)提供給企業(yè)。

GPT-4正式發(fā)布,這是關(guān)注人工智能發(fā)展的人士期待已久的。當(dāng)OpenAI在2022年11月下旬發(fā)布ChatGPT時(shí),幾乎所有人都大吃一驚,但至少在一年前,OpenAI正在研發(fā)GPT-4的事情就已經(jīng)廣為人知了,盡管人們一直在猜測(cè)它究竟會(huì)是什么。事實(shí)上,在ChatGPT出乎意料的爆火引發(fā)轟動(dòng)之后,人工智能炒作甚囂塵上,OpenAI的首席執(zhí)行官薩姆·奧爾特曼認(rèn)為有必要盡力為GPT-4即將發(fā)布的期望潑冷水。“GPT-4謠言四起是一件可笑的事情。我不知道這一切從何而來(lái)。”奧爾特曼于今年1月在舊金山的一次活動(dòng)中接受采訪時(shí)說(shuō)道。在提到通用人工智能(AGI)的概念時(shí),他表示,這種超級(jí)智能機(jī)器一直是科幻小說(shuō)的熱門題材,“人們的設(shè)想太美好了,他們會(huì)失望的。他們對(duì)我們寄予厚望,希望我們能夠研發(fā)出真正的通用人工智能,但現(xiàn)實(shí)是,我們沒(méi)有研發(fā)出真正的通用人工智能。”

3月15日,我與幾位幫助構(gòu)建GPT-4的OpenAI研究人員談?wù)摿怂墓δ堋⒕窒扌砸约八麄兪侨绾螛?gòu)建它的。研究人員簡(jiǎn)單介紹了他們使用的方法,但他們有很多保密信息,包括模型的大小、用于訓(xùn)練的數(shù)據(jù)究竟是什么、訓(xùn)練和運(yùn)行它需要多少專用計(jì)算機(jī)芯片(圖形處理單元)、它的碳足跡等等。

OpenAI是由埃隆·馬斯克聯(lián)合創(chuàng)立的。馬斯克表示,他之所以選擇這個(gè)名字,是因?yàn)樗M@個(gè)新的研究實(shí)驗(yàn)室能夠致力于實(shí)現(xiàn)人工智能民主化和透明化,并公布所有研究成果。多年來(lái),OpenAI逐漸放棄了其創(chuàng)建之初關(guān)于透明度的承諾,由于關(guān)于GPT-4的細(xì)節(jié)公布很少,一些計(jì)算機(jī)科學(xué)家打趣說(shuō),該實(shí)驗(yàn)室應(yīng)該改名。Nomic AI公司的設(shè)計(jì)副總裁本·施密特在推特(Twitter)上說(shuō):“我認(rèn)為這一做法關(guān)閉了‘Open’AI 的大門。他們?cè)诮榻B GPT-4 的 98 頁(yè)論文中自豪地宣稱,他們‘沒(méi)有’透露任何關(guān)于訓(xùn)練集內(nèi)容的信息。”

OpenAI的首席科學(xué)家伊利亞·薩茨科弗告訴《財(cái)富》雜志,保密的主要原因是“這是一個(gè)競(jìng)爭(zhēng)非常激烈的環(huán)境”,該公司不希望商業(yè)對(duì)手迅速?gòu)?fù)制他們的成果。他還表示,在未來(lái),隨著人工智能模型變得更加強(qiáng)大,而“這些功能很容易造成巨大傷害”,出于安全考慮,限制透露有關(guān)這些模型如何創(chuàng)建的信息將非常重要。

有時(shí),薩茨科弗在談到GPT-4時(shí),似乎故意回避對(duì)其內(nèi)部工作原理的嚴(yán)肅討論。在討論創(chuàng)建生成式預(yù)訓(xùn)練轉(zhuǎn)化器(或稱GPT)的高級(jí)流程時(shí),他描述了一個(gè)“實(shí)現(xiàn)魔法的配方”,生成式預(yù)訓(xùn)練轉(zhuǎn)化器是支撐大多數(shù)大型語(yǔ)言模型的基本模型架構(gòu)。薩茨科弗說(shuō):“GPT-4是這種魔法的最新表現(xiàn)形式。”在回答關(guān)于OpenAI是如何設(shè)法減少GPT-4產(chǎn)生幻覺(jué)的傾向的問(wèn)題時(shí),薩茨科弗表示:“我們只是教它不要產(chǎn)生幻覺(jué)。”

為了安全性和易用性,進(jìn)行了6個(gè)月的微調(diào)

薩茨科弗在OpenAI的兩位同事提供了更多關(guān)于OpenAI如何“教它不要產(chǎn)生幻覺(jué)”的細(xì)節(jié)。OpenAI的技術(shù)人員雅各布·帕喬基指出,光是更大模型加持,以及在預(yù)訓(xùn)練期間增大學(xué)習(xí)的數(shù)據(jù)量,似乎就是其準(zhǔn)確性提高的部分原因。瑞安·洛是OpenAI負(fù)責(zé)“對(duì)齊”工作的團(tuán)隊(duì)的共同負(fù)責(zé)人,即負(fù)責(zé)確保人工智能系統(tǒng)只完成人類希望它完成的工作,而且不做我們不希望它做的事情。他說(shuō),在對(duì)GPT-4進(jìn)行預(yù)訓(xùn)練后,OpenAI還花了大約6個(gè)月的時(shí)間對(duì)模型進(jìn)行了微調(diào),使其既安全又易于使用。他表示,它使用的一種方法是收集人類對(duì)GPT-4輸出結(jié)果的反饋,然后利用這些反饋推動(dòng)模型生成它預(yù)測(cè)更有可能從這些人類審查員那里得到積極反饋的回答。這個(gè)過(guò)程被稱為“從人類反饋中強(qiáng)化學(xué)習(xí)”,是使ChatGPT成為如此吸引人且大有用處的聊天機(jī)器人的部分原因。

洛指出,一些用于改進(jìn)GPT-4的反饋來(lái)自ChatGPT用戶的體驗(yàn),這表明,在許多競(jìng)爭(zhēng)對(duì)手推出他們的系統(tǒng)之前,讓數(shù)億人使用該聊天機(jī)器人,可能為OpenAI創(chuàng)造了一個(gè)旋轉(zhuǎn)更快的“數(shù)據(jù)飛輪”,讓該公司在構(gòu)建未來(lái)先進(jìn)的人工智能軟件方面更具優(yōu)勢(shì),競(jìng)爭(zhēng)對(duì)手可能很難與之匹敵。

洛說(shuō),OpenAI專門用更多給出正確答案的例子來(lái)訓(xùn)練GPT-4,以提高模型執(zhí)行該任務(wù)的能力,并降低它產(chǎn)生幻覺(jué)的幾率。他還表示,OpenAI使用GPT-4來(lái)生成模擬對(duì)話和其他數(shù)據(jù),然后反饋給GPT-4進(jìn)行微調(diào),以幫助它減少幻覺(jué)。這是“數(shù)據(jù)飛輪”發(fā)揮作用的另一個(gè)例子。

“魔法”是否足夠可靠,可以面向大眾正式發(fā)布呢?

薩茨科弗為OpenAI發(fā)布GPT-4的決定進(jìn)行了辯護(hù),盡管它存在局限性和風(fēng)險(xiǎn)。他說(shuō):“好吧,這個(gè)模型是有缺陷的,但有多大的缺陷呢?目前該模型還配置了安全緩解措施。”他還解釋說(shuō)OpenAI認(rèn)為這些護(hù)欄和安全措施足夠有效,可以允許該公司發(fā)布該模型。薩茨科弗還指出,OpenAI的使用條款和條件禁止惡意使用該模型,如今,該公司的監(jiān)控程序已經(jīng)就位,試圖檢查用戶是否違反了這些條款。他說(shuō),結(jié)合GPT-4在幻覺(jué)等關(guān)鍵指標(biāo)上表現(xiàn)出的更好的安全性,以及它能夠拒絕“越獄”或跳過(guò)護(hù)欄的請(qǐng)求,“讓我們覺(jué)得繼續(xù)發(fā)布GPT-4是合適的,就像我們目前正在做的那樣。”

在為《財(cái)富》雜志的工作人員進(jìn)行的演示中,OpenAI的研究人員要求該系統(tǒng)寫(xiě)一篇關(guān)于自身的總結(jié)性文章,但只使用以字母“G”開(kāi)頭的單詞——GPT-4的行文相對(duì)連貫。薩茨科弗說(shuō)GPT-3.5可能會(huì)搞砸這個(gè)任務(wù),因?yàn)樗褂昧艘恍┎皇且浴癎”開(kāi)頭的單詞。在另一個(gè)例子中,演示人員向GPT-4展示了美國(guó)稅法的部分條例,然后給出了一個(gè)關(guān)于一對(duì)特定夫婦的場(chǎng)景,并要求GPT-4參照剛剛看過(guò)的法規(guī)條文計(jì)算他們應(yīng)該繳納的稅款。GPT-4似乎在大約一秒鐘內(nèi)就得出了正確的稅額。(雖然我未能回過(guò)頭來(lái)仔細(xì)檢查它給出的答案。)

盡管演示令人印象深刻,但一些人工智能研究人員和技術(shù)專家表示,像GPT-4這樣的系統(tǒng)對(duì)于許多企業(yè)用例來(lái)說(shuō)仍然不夠可靠,特別是在信息檢索方面,因?yàn)镚PT-4還是有可能出現(xiàn)幻覺(jué)。如果用戶向它提問(wèn),但該用戶并不知道答案,那么在這種情況下,可能就仍然不適合使用GPT-4。創(chuàng)建數(shù)據(jù)編目和開(kāi)發(fā)檢索軟件的軟件公司Alation的聯(lián)合創(chuàng)始人及首席戰(zhàn)略官阿龍·卡爾布表示:“即使幻覺(jué)發(fā)生率下降了,但如果幻覺(jué)發(fā)生率沒(méi)有達(dá)到無(wú)限小,或者至少像人類專家分析師那樣小的情況下,可能就仍然不適合使用GPT-4。”

卡爾布還稱,即便提示模型只從特定的數(shù)據(jù)集生成答案,或者只使用模型總結(jié)通過(guò)傳統(tǒng)搜索算法搜索出的信息,也可能不足以確保模型沒(méi)有編造部分答案,也不足以確保模型不會(huì)給出在預(yù)訓(xùn)練期間學(xué)習(xí)的不準(zhǔn)確的或過(guò)時(shí)的信息。

卡爾布指出,使用大型語(yǔ)言模型是否合適,將取決于用例,以及由人類來(lái)審查人工智能給出的答案是否現(xiàn)實(shí)可行。他說(shuō),要求GPT-4生成營(yíng)銷文案,在這種情況下,文案將由人類進(jìn)行審查和編輯,這可能是可行的。但在人類不可能對(duì)模型生成的所有內(nèi)容進(jìn)行事實(shí)核查的情況下,依賴GPT-4給出的答案可能是危險(xiǎn)的。(財(cái)富中文網(wǎng))

譯者:中慧言-王芳

So it’s finally here: GPT-4. This is latest and greatest artificial intelligence system from OpenAI, and a successor to the A.I. model that powers the wildly popular ChatGPT.

OpenAI, the San Francisco A.I. lab that is now closely tied to Microsoft, says that GPT-4 is much more capable than the GPT-3.5 model underpinning the consumer version of ChatGPT. For one thing, GPT-4 is multi-modal: it can take in images as well as text, although it only outputs text. This opens up the ability of the A.I. model to “understand” photos and scenes. (Although for now this visual understanding capability is only being offered through OpenAI’s partnership with Be My Eyes, a free mobile app for the visually impaired.)

The new model performs much better than GPT-3.5 on a range of benchmark tests for natural language processing and computer vision algorithms. It also performs very well on a battery of diverse tests designed for humans, including a very impressive score on a simulated bar exam as well as scoring a five out of five on a wide range of Advanced Placement exams, from Math to Art History. (Interestingly, the system scores poorly on both the AP English Literature and AP English Composition exams and there is already some questions from machine learning experts about whether there may be less than meets the eye to GPT-4’s stellar exam performance.)

The model, according to OpenAI, is 40% more likely to return factual answers to questions—although it may still in some cases simply invent information, a phenomenon A.I. researchers call “hallucination.” It is also less likely to jump the guardrails OpenAI has given the model to try to keep it from spewing toxic or biased language, or recommending actions that might cause harm. OpenAI said GPT-4 is more likely to refuse such requests than GPT-3.5 was.

Still, GPT-4 still has many of the same potential risks and flaws as other large language models. It isn’t entirely reliable. Its answers are unpredictable. It can be used to produce misinformation. It can still be pushed to jump its guardrails and give outputs that might be unsafe, either because they might be hurtful to the person reading the output or because they might encourage the person to take actions that would harm themselves or others. It can be used, for instance, to help someone find ways to make improvised chemical weapons or explosives from household products.

Because of this, OpenAI cautioned users that “Great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with additional context, or avoiding high-stakes uses altogether) matching the needs of a specific use-case.” And yet, OpenAI has released the model as a paid service to ChatGPT Plus customers and businesses purchasing services through its cloud-based application programming interface (or API).

GPT-4’s release had been widely anticipated among those who follow A.I. developments. While ChatGPT took almost everyone by surprise when OpenAI released it in late November, it was widely known for at least a year that OpenAI was working on something called GPT-4, although there has been wild speculation about exactly what it would be. In fact, after ChatGPT became an unexpected viral sensation, massively ramping up hype around A.I., Sam Altman, the CEO of OpenAI, felt it necessary to try to tamp down expectations surrounding GPT-4’s imminent release. “The GPT-4 rumor mill is a ridiculous thing. I don’t know where it all comes from,” Altman said in an interview at an event in San Francisco in January. Referring to the idea of artificial general intelligence (or AGI), the kind of machine superintelligence that has been a staple of science fiction, he said, “people are begging to be disappointed and they will be. The hype is just like… We don’t have an actual AGI and that’s sort of what’s expected of us.”

In March 15, I talked to several of the OpenAI researchers who helped build GPT-4 about its capabilities, limitations, and how they built it. The researchers spoke in general terms about the methods they used, but there is much about GPT-4 they are keeping under wraps, including the size of the model, exactly what data was used to train it, how many specialized computer chips (graphics processing units, or GPUs) were needed to train and run it, what its carbon footprint is, and more.

OpenAI was co-founded by Elon Musk, who has said he chose the name because he wanted the new research lab to be dedicated to democratizing A.I. and being transparent, publishing all its research. Over the years, OpenAI has increasingly moved away from its founding dedication to transparency, and with little detail about GPT-4 being released, some computer scientists quipped that the lab should change its name. “I think we can call it shut on ‘Open’ AI,” tweeted Ben Schmidt, the vice president of design at a company called Nomic AI. “The 98 page paper introducing GPT-4 proudly declares that they’re disclosing *nothing* about the contents of their training set.”

Ilya Sutskever, OpenAI’s chief scientist, told Fortune the reason for this secrecy was primarily because “it is simply a competitive environment” and the company did not want commercial rivals to quickly replicate its achievement. He also said that in the future, as A.I. models became even more capable and “those capabilities could be easily very harmful,” it will be important for safety reasons to limit information about how the models were created.

At times, Sutskever spoke of GPT-4 in terms that seemed designed to sidestep serious discussion of its inner workings. He described a “recipe for producing magic” when discussing the high-level process of creating generative pre-trained transformers, or GPTs, the basic model architecture that underpins most large language models. “GPT-4 is the latest manifestation of this magic,” Sutskever said. In response to a question about how OpenAI had managed to reduce GPT-4’s tendency to hallucinate, Sutskever said, “We just teach it not to hallucinate.”

Six months of fine tuning for safety and ease-of-use

Two of Sutskever’s OpenAI colleauges did provide slightly more detail on how OpenAI “just taught it not to hallucinate.” Jakub Pachocki, a member of OpenAI’s technical staff, said the model’s increased size alone, and the larger amount of data it ingested during pre-training, seemed to be part of the reason for its increased accuracy. Ryan Lowe, who co-leads OpenAI’s team that works on “alignment,” or making sure A.I. systems do what humans want them to and don’t do things we don’t want them to do, said that the OpenAI also spent about six months after pre-training GPT-4 fine-tuning the model to be both safer and easier to use. One method it used, he said, was to collect human feedback on GPT-4’s outputs and then used those to push the model towards trying to generate responses that it predicted were more likely to get positive feedback from these human reviewers. This process, called “reinforcement learning from human feedback” was part of what made ChatGPT such an engaging and useful chatbot.

Lowe said some of the feedback used to refine GPT-4 came from the experience of ChatGPT users, showing the way in which getting that chatbot out into the hands of hundreds of millions of people before many competitors debuted rival systems may have created a faster-spinning “data flywheel” for OpenAI that gives the company an advantage in building future, advanced A.I. software that its rivals may find hard to match.

OpenAI specifically trained GPT-4 on more examples of accurate question-answering in order to boost the model’s ability to perform that task, and reduce the chances of it hallucinating, Lowe said. He also said that OpenAI used GPT-4 itself to generate simulated conversations and other data that was then fed back into the fine-tuning of GPT-4 to help it hallucinate less. This is another example of the “data flywheel” in action.

Is the “magic” reliable enough for release?

Sutskever defended OpenAI’s decision to release GPT-4, despite its limitations and risks. “The model is flawed, ok, but how flawed?” he said. “There are some safety mitigations that exist on the model right now,” he said, explaining that OpenAI judged these guardrails and safety measures to be effective enough to allow the company to release the model. He also noted that OpenAI’s terms and conditions of use prohibited certain malicious uses and that the company now had monitoring procedures in place to try to check that users were not violating those terms. He said this in combination with GPT-4’s better safety profile on key metrics like hallucinations and the ease with which it could be “jailbroken” or made to bypass guardrails, “made us feel that it is appropriate to proceed with the GPT-4 release, as we’re doing right now.”

In a demonstration for Fortune, OpenAI researchers asked the system to summarize an article about itself, but using only words that start with the letter ‘G’—which GPT-4 was able to do relatively coherently. Sutskever said that GPT-3.5 would have flubbed the task, resorting to some words that did not start with ‘G.’ In another example, GPT-4 was presented with part of the U.S. tax code and then given a scenario about a specific couple and asked to calculate how much tax they owed, with reference to the passage of regulations it had just been given. GPT-4 seemingly came up with the right amount of tax in about a second. (Although I was not able to go back through and double-check its answer.)

Despite impressive demonstrations, some A.I. researchers and technologists say that systems like GPT-4 are still not reliable enough for many enterprise use cases, particularly when it comes to information retrieval, because of the chance of hallucination. In cases where a human is asking it a question to which that user doesn’t know the answer, GPT-4 is still probably not appropriate. “Even if the hallucination rate goes down, until it is infinitesimal, or at least as small as would be the case with an expert human analyst, it is probably not appropriate to use it,” Aaron Kalb, co-founder and chief strategy officer at Alation, a software company that builds data cataloging and retrieval software.

He also said that even prompting the model to answer only from a particular set of data or only using the model to summarize information surfaced through a traditional search algorithm might not be sufficient to be certain the model wasn’t making up some part of its answer or surfacing inaccurate or outdated information that it had ingested during its pre-training.

Kalb said whether it was appropriate to use large language models would depend on the use case and whether it was practical for a human to review the A.I.’s answers. He said that asking GPT-4 to generate marketing copy, in cases where that copy is going to be reviewed and edited by a human, was probably fine. But in situations where it wasn’t possible for a human to fact-check everything the model produced, relying on GPT-4’s answers might be dangerous.

財(cái)富中文網(wǎng)所刊載內(nèi)容之知識(shí)產(chǎn)權(quán)為財(cái)富媒體知識(shí)產(chǎn)權(quán)有限公司及/或相關(guān)權(quán)利人專屬所有或持有。未經(jīng)許可,禁止進(jìn)行轉(zhuǎn)載、摘編、復(fù)制及建立鏡像等任何使用。
0條Plus
精彩評(píng)論
評(píng)論

撰寫(xiě)或查看更多評(píng)論

請(qǐng)打開(kāi)財(cái)富Plus APP

前往打開(kāi)

            主站蜘蛛池模板: 巴塘县| 南平市| 尉氏县| 繁昌县| 大安市| 普定县| 蕉岭县| 横山县| 罗源县| 乌审旗| 万山特区| 湘潭市| 通州市| 隆林| 南汇区| 五常市| 祁连县| 项城市| 舒城县| 都兰县| 奎屯市| 罗山县| 城步| 红河县| 芮城县| 蚌埠市| 陆良县| 邹城市| 赤壁市| 义乌市| 息烽县| 黔西县| 南康市| 包头市| 松桃| 铜川市| 无锡市| SHOW| 临湘市| 修武县| 拜泉县|