創作和編輯仿真數字圖片將變得更容易。
舊金山人工智能公司OpenAI宣布,其開發的一款人工智能系統可以根據對物品或場景的描述,自動生成高度逼真的圖片。該公司與微軟(Microsoft)關系密切。通過該系統還可以使用簡單的工具編輯圖片和修改文字,不需要用戶精通Photoshop或數字藝術技能。
參與該項目的OpenAI研究員亞歷克斯·尼克爾表示:“我們希望這類工具能讓所有人都有能力創作自己想要的圖片。”他表示,該工具對于產品設計師、雜志封面設計師和藝術家都有很大幫助,可以用來啟發靈感和進行頭腦風暴,或者直接用于創作最終作品。他還表示,電腦游戲公司也可以使用該軟件生成場景和角色,盡管它目前只能生成靜態圖片,無法生成動畫或視頻。
這款軟件更容易被用于生成種族主義梗圖或者創作虛假圖片,作為宣傳材料或傳播虛假信息,甚至被用于制作色情圖片,因此OpenAI表示已經采取措施限制該軟件在這方面的能力,首先是從人工智能培訓數據中刪除這些圖片,并對人工智能生成的圖片進行基于規則的篩查和人工內容審查。
OpenAI也在非常慎重地控制新人工智能的發布,該公司表示該軟件目前只是一個研究項目,并不是一款商業產品。該公司目前僅向經過精心篩選的試用版測試人員分享該軟件。但OpenAI之前基于自然語言處理開發的突破性技術,在約18月內便被應用于商業產品。
OpenAI開發的最新軟件名為DALL-E 2,是其2021年初發布的DALL-E系統的升級版。(首字母縮寫較為復雜,它會讓人們想起皮克斯(Pixar)著名動畫電影中的機器人瓦力(WALL-E),然后用超現實主義藝術家薩爾瓦多·達利的名字玩了一個文字游戲,而該系統生成的圖片確實是超現實的,因此這個名字很有意義。)
初版DALL-E只能生成卡通圖片,通常使用簡單背景。新版DALL-E 2可以在復雜背景下生成照片品質的高分辨率圖像,有景深效果、真實的光線、陰影和倒影等。
雖然之前使用計算機渲染圖像也可以達到這種逼真的效果,但這需要具備高深的藝術技能。而通過這款軟件,用戶只需要輸入命令,比如“戴貝雷帽和穿黑色高領衫的柴犬”,然后DALL-E 2就會生成幾十張符合這個主題的逼真圖片。
DALL-E 2讓圖片編輯變得更容易。用戶可以用方框圈出圖片中希望修改的部分,然后用自然語言指令說明其希望進行哪些修改。例如,你可以框住柴犬的貝雷帽,然后輸入命令“將貝雷帽變成紅色”,它就會自動變色,但不會改動圖片的其他部分。此外,DALL-E 2還可以生成同一張圖片的不同風格,用戶同樣只需要輸入純文本命令即可。
OpenAI所做的測試顯示,如果用戶在物品上添加的文本標簽與實際不符,這種行為不太容易騙過DALL-E 2使用的字幕和圖片分類算法。例如,之前將文字與圖片關聯的算法如果在接受訓練時被展示的是一張蘋果的圖片和寫著“披薩”的文字標簽,那么它可能會將圖片誤認為是披薩。而DALLE-2使用的算法不會犯同樣的錯誤。它依舊會將圖片中的物品識別為蘋果。
OpenAI聯合創始人兼首席科學家伊利亞·薩茨克爾表示,DALL-E 2是OpenAI實現創建通用人工智能(AGI)這一目標的重要一步。通用人工智能軟件在多類任務中可以有不亞于人類甚至勝過人類的表現。薩茨克爾表示,通用人工智能需要處理“多模態”概念理解,能夠將一個詞與圖片或一組圖片相互關聯。他表示DALL-E 2就是創造具備這種理解能力的人工智能的一次嘗試。
OpenAI曾嘗試通過自然語言處理創造通用人工智能。該公司開發的一款商業產品是支持其他公司訪問GPT-3的編程界面。GPT-3是一個龐大的自然語言處理系統,可以撰寫大段小說文字,還能執行翻譯、匯總等許多自然語言任務。
當然,DALL-E 2并不完美。該系統有時候無法生成復雜場景下的細節。某些光影效果可能會有偏差,或者模糊兩個物品之間本應涇渭分明的邊緣。另外,與其他多模態人工智能軟件相比,它在理解“綁定屬性”方面并不擅長。如果你發出指令“在藍色立方體上面的紅色立方體”,它有時候會錯誤生成紅色立方體位于藍色立方體下方的圖片。(財富中文網)
譯者:劉進龍
審校:汪皓
創作和編輯仿真數字圖片將變得更容易。
舊金山人工智能公司OpenAI宣布,其開發的一款人工智能系統可以根據對物品或場景的描述,自動生成高度逼真的圖片。該公司與微軟(Microsoft)關系密切。通過該系統還可以使用簡單的工具編輯圖片和修改文字,不需要用戶精通Photoshop或數字藝術技能。
參與該項目的OpenAI研究員亞歷克斯·尼克爾表示:“我們希望這類工具能讓所有人都有能力創作自己想要的圖片。”他表示,該工具對于產品設計師、雜志封面設計師和藝術家都有很大幫助,可以用來啟發靈感和進行頭腦風暴,或者直接用于創作最終作品。他還表示,電腦游戲公司也可以使用該軟件生成場景和角色,盡管它目前只能生成靜態圖片,無法生成動畫或視頻。
這款軟件更容易被用于生成種族主義梗圖或者創作虛假圖片,作為宣傳材料或傳播虛假信息,甚至被用于制作色情圖片,因此OpenAI表示已經采取措施限制該軟件在這方面的能力,首先是從人工智能培訓數據中刪除這些圖片,并對人工智能生成的圖片進行基于規則的篩查和人工內容審查。
OpenAI也在非常慎重地控制新人工智能的發布,該公司表示該軟件目前只是一個研究項目,并不是一款商業產品。該公司目前僅向經過精心篩選的試用版測試人員分享該軟件。但OpenAI之前基于自然語言處理開發的突破性技術,在約18月內便被應用于商業產品。
OpenAI開發的最新軟件名為DALL-E 2,是其2021年初發布的DALL-E系統的升級版。(首字母縮寫較為復雜,它會讓人們想起皮克斯(Pixar)著名動畫電影中的機器人瓦力(WALL-E),然后用超現實主義藝術家薩爾瓦多·達利的名字玩了一個文字游戲,而該系統生成的圖片確實是超現實的,因此這個名字很有意義。)
初版DALL-E只能生成卡通圖片,通常使用簡單背景。新版DALL-E 2可以在復雜背景下生成照片品質的高分辨率圖像,有景深效果、真實的光線、陰影和倒影等。
雖然之前使用計算機渲染圖像也可以達到這種逼真的效果,但這需要具備高深的藝術技能。而通過這款軟件,用戶只需要輸入命令,比如“戴貝雷帽和穿黑色高領衫的柴犬”,然后DALL-E 2就會生成幾十張符合這個主題的逼真圖片。
DALL-E 2讓圖片編輯變得更容易。用戶可以用方框圈出圖片中希望修改的部分,然后用自然語言指令說明其希望進行哪些修改。例如,你可以框住柴犬的貝雷帽,然后輸入命令“將貝雷帽變成紅色”,它就會自動變色,但不會改動圖片的其他部分。此外,DALL-E 2還可以生成同一張圖片的不同風格,用戶同樣只需要輸入純文本命令即可。
OpenAI所做的測試顯示,如果用戶在物品上添加的文本標簽與實際不符,這種行為不太容易騙過DALL-E 2使用的字幕和圖片分類算法。例如,之前將文字與圖片關聯的算法如果在接受訓練時被展示的是一張蘋果的圖片和寫著“披薩”的文字標簽,那么它可能會將圖片誤認為是披薩。而DALLE-2使用的算法不會犯同樣的錯誤。它依舊會將圖片中的物品識別為蘋果。
OpenAI聯合創始人兼首席科學家伊利亞·薩茨克爾表示,DALL-E 2是OpenAI實現創建通用人工智能(AGI)這一目標的重要一步。通用人工智能軟件在多類任務中可以有不亞于人類甚至勝過人類的表現。薩茨克爾表示,通用人工智能需要處理“多模態”概念理解,能夠將一個詞與圖片或一組圖片相互關聯。他表示DALL-E 2就是創造具備這種理解能力的人工智能的一次嘗試。
OpenAI曾嘗試通過自然語言處理創造通用人工智能。該公司開發的一款商業產品是支持其他公司訪問GPT-3的編程界面。GPT-3是一個龐大的自然語言處理系統,可以撰寫大段小說文字,還能執行翻譯、匯總等許多自然語言任務。
當然,DALL-E 2并不完美。該系統有時候無法生成復雜場景下的細節。某些光影效果可能會有偏差,或者模糊兩個物品之間本應涇渭分明的邊緣。另外,與其他多模態人工智能軟件相比,它在理解“綁定屬性”方面并不擅長。如果你發出指令“在藍色立方體上面的紅色立方體”,它有時候會錯誤生成紅色立方體位于藍色立方體下方的圖片。(財富中文網)
譯者:劉進龍
審校:汪皓
The creation and editing of photorealistic digital images is about to get much easier.
OpenAI, the San Francisco artificial intelligence company that is closely affiliated with Microsoft, just announced it has created an A.I. system that can take a description of an object or scene and automatically generate a highly realistic image depicting it. The system also allows a person to easily edit the image with simple tools and text modifications, rather than requiring traditional Photoshop or digital art skills.
“We hope tools like this democratize the ability for people to create whatever they want,” Alex Nichol, one of the OpenAI researchers who worked on the project, said. He said the tool could be useful for product designers, magazine cover designers, and artists—either to use for inspiration and brainstorming, or to actually create finished works. He also said computer game companies might want to use it to generate scenes and characters—although the software currently generates still images, not animation or videos.
Because the software could be also used to more easily generate racist memes or create fake images to be used in propaganda or disinformation, or, for that matter, to create pornography, OpenAI says it has taken steps to limit the software’s capabilities in this area, first by trying to remove such images from the A.I.’s training data, but also by applying rule-based filters and human content reviews to the images the A.I. generates.
OpenAI is also trying to carefully control the release of the new A.I., which it describes as currently just a research project and not a commercial product. It is sharing the software only with what it describes as a select and screened group of beta testers. But in the past, OpenAI’s breakthroughs based on natural-language processing have often found their way into commercial products within about 18 months.
The software OpenAI has created is called DALL-E 2, and it is an updated version of a system that OpenAI debuted in early 2021, simply called DALL-E. (The acronym is complicated, but it is meant to evoke a mashup of WALL-E, the animated robot of Pixar movie fame, and a play on words for Dali, as in Salvador, the surrealist artist, which makes sense given the surreal nature of the images the system can generate.)
The original DALL-E could render images only in a cartoonish manner, often against a plain background. The new DALL-E 2 can generate photo-quality high-resolution images, complete with complex backgrounds, depth-of-field effects, realistic shadows, shading, and reflections.
While these realistic renderings have been possible with computer-rendered images previously, creating them required some serious artistic skill. Here, all a user has to do is type the command, “a shiba inu wearing a beret and a black turtleneck,” and then DALL-E 2 spits out dozens of photorealistic variations on that theme.
DALL-E 2 also makes editing an image easy. A user can simply place a box around the part of the image they want to modify and specify the modification they want to make in natural-language instructions. You could, for instance, put a box around the Shiba Inu’s beret and type “make the beret red,” and the beret would be transformed without altering the rest of the image. In addition, DALL-E 2 can produce the same image in a wide range of styles, which the user can also specify in plain text.
The captioning and image classification algorithms that underpin DALL-E 2 are, according to tests OpenAI performed, less susceptible to attempts to trick it in which an object is labeled with text that is different from what the object actually is. For instance, previous algorithms that were trained to associate text and images, when shown an apple with a printed label saying “pizza” attached to it, would mistakenly label the image as being a pizza. The system that now makes up part of DALLE-2 does not make the same mistake. It still identifies the image as being of an apple.
Ilya Sutskever, OpenAI’s cofounder and chief scientist, said that DALL-E 2 was an important step toward OpenAI’s goal of trying to create artificial general intelligence (AGI), a single piece of A.I. software that can achieve human-level or better than human-level performance across a wide range of disparate tasks. AGI would need to possess “multimodal” conceptual understanding—being able to associate a word with an image or set of images and vice versa, Sutskever said. And DALL-E 2 is an attempt to create an A.I. with this sort of understanding, he said.
In the past, OpenAI has tried to pursue AGI through natural-language processing. The company’s one commercial product is a programming interface that lets other businesses access GPT-3, a massive natural-language processing system that can compose long passages of novel text, as well as perform a number of other natural-language tasks, from translation to summarization.
DALL-E 2 is far from perfect though. The system sometimes cannot render details in complex scenes. It can get some of the lighting and shadow effects slightly wrong or merge the borders of two objects that should be distinct. It is also less adept than some other multimodal A.I. software at understanding “binding attributes.” Give it the instruction, “a red cube on top of a blue cube,” and it will sometimes offer variations in which the red cube appears below a blue cube.