下次你在Zoom上打視頻電話的時候,你可以讓對方把手指放在鼻子旁邊,或者讓對方將側臉對著鏡頭保持一分鐘。
這些都是專家推薦的方法,以確保你的聊天對象是真人,而不是用深度偽造(Deepfake)技術生成的假形象。
這種防范措施雖然顯得有些莫名其妙,但是我們本來就生活在一個奇怪的年代。
今年8月,加密貨幣交易所幣安(Binance)的一名高管表示,曾經有騙子利用深度偽造技術假冒他的形象,用來對數個虛擬幣項目實施電信詐騙。幣安的溝通總監帕特里克·希爾曼稱,曾經有詐騙分子在Zoom視頻電話上假冒他的形象。(希爾曼并未提供相關證據證實他的說法,一些專家對此表示懷疑,但網絡安全研究人員表示,這類事件是有可能發生的。)美國聯邦調查局(FBI)在今年7月曾經警告道,有人可能會在網絡求職遭遇時對方使用深度造假技術詐騙。一個月前,歐洲的幾位市長表示,他們也被假冒的烏克蘭總統弗拉基米爾·澤倫斯基騙了。更離譜的是,美國的一家名叫Metaphysic的初創公司開發了一款深度偽造軟件,它在電視真人秀《美國達人秀》(America’s Got Talent)的決賽中,直接在觀眾面前,將幾名歌手的臉無縫切換成了西蒙·考威爾等幾位明星評委的臉,讓所有人驚掉下巴。
深度偽造,是指利用使用人工智能技術,創建極具說服力的虛假圖像和視頻。以前要創假這樣一個虛假形象,需要目標對象的大量照片,還需要很多時間和相當高超的編程和特效技術。即便假期臉被生成出來了,以前的AI模型的響應速度也不夠快,無法實時生成視頻直播級的完美假臉。
然而從幣安和《美國達人秀》的例子能夠看出,現在的情況已經不同了,人們在實時視頻傳輸中使用深度偽造技術已經越來越容易了,而且此類軟件現在也是唾手可得,很多還是免費的,用起來也沒有什么技術門檻。這也為各種各樣的電信詐騙甚至政治謠言提供了可能。
加州大學伯克利分校(University of California at Berkeley)的計算機科學家哈尼·法里德是視頻分析和認證領域的專家。他感嘆道:“我對現在實時深度偽技術造的速度和質量感到驚訝。”他表示,現在至少有三種不同的開源程序可以讓人們制作實時深度造假視頻。
法里德等專家都擔心深度偽造技術會使電信詐騙發展到一個新高度。“這簡直就像給網絡釣魚詐騙打了興奮劑。”他說。
識別深度造假的小技巧
好在專家表示,目前還是有很多小技巧能夠幫助你拆穿騙子的畫皮。最可靠也最簡單的方法,就是讓對方側過臉去,讓鏡頭捕捉他的完整側臉。深度偽造技術目前還無法保證側面不露破綻,最主要的原因就是很難獲取足夠多的側面照片來訓練AI模型。雖然有一些方法可以通過正面圖像推導出側面形象,但這會大大增加生成圖像過程的復雜性。
另外,深度偽造軟件還利用了人臉上的“錨點”,來將深度偽造的“面具”匹配到人臉上。所以只需要讓對方轉頭90度,就會導致一半的錨點不可見,這通常就會導致圖像扭曲、模糊、變形,非常容易注意到。
位于以色列的本古里安大學(Ben-Gurion University)進攻性人工智能實驗室(Offensive AI Lab)的負責人伊斯羅爾·米爾斯基還通過試驗發現了很多能夠檢測出深度偽造的方法。比如在視頻通話過程中要求人們隨便拿一個東西在臉前劃過,讓某個東西在他面前反彈一下,讓他們整理一下自己的襯衫,摸一下頭發,或者用手遮擋半張臉。以上每一種辦法,都會導致深度造假軟件無法描繪多出來的物體,或者導致人臉嚴重失真。對于音頻深度造假,米爾斯基建議你可以要求對方吹口哨,或者換一種口音說話,或者隨機挑一首曲子讓對方哼唱。
米爾斯基指出:“目前所有現有的深度偽造技術都采用了非常類似的協議。它們雖然接受了大量數據的訓練,但其模式是非常特定的。多數軟件只能模仿人的正臉,而處理不好側面或者遮擋臉部的物體。”
法里德也展示了一種檢測深度偽造的方法,那就是用一個簡單的軟件程序,讓對方的電腦屏幕以某種模式閃爍,讓電腦屏幕在對方臉上投射某種模式的光線。深度偽造技術要么無法將燈光效果展示在模擬圖像中,要么反應速度太慢。法里德表示,只要讓對方使用另一個光源——例如手機的手電筒,從另一角度照亮他們的臉,就可以達到類似的檢測效果。
米爾斯基表示,要真實地模擬某個人做一些不尋常的事情,人工智能軟件就需要看到幾千個某人做這種事情的例子。但收集這么多的數據是很困難的。即便你成功訓練AI軟件完成了這些有挑戰性的任務——比如拿起一根鉛筆,從臉上劃過,且不露破綻,那么只要你要求對方拿另一個東西代替鉛筆(例如一個杯子),那么AI軟件還是會失敗。而且一般的詐騙分子也不太可能把假臉做到能夠攻克“鉛筆測試”和“側臉測試”的地步。每個不同的任務都會增加AI模型訓練的復雜性。“你希望深度偽造軟件完善的方面是有限的。”米爾斯基說。
深度偽造技術也在日益進步
目前,很少有安全專家建議大家在視頻通話前先驗證身份——就像登陸很多網站要先填驗證碼一樣。不過米爾斯基和法里德都認為,在一些重要場合,視頻通話前先“驗明正身”是有必要的,比如政治領導人之間的對話,或者有可能導致高額金額交易的對話。另外我們尤其要警惕一些反常的情形,例如陌生號碼打來的電話,又或者人們的一些反常行為和要求。
法里德建議,對于一些非常重要的電話,你也可以使用簡單的雙因素認證,比如你能夠同時給對方發條短信,問問他是不是正在跟你視頻通話。
專家強調,深度偽造技術一直在進步,誰也不能保證將來它們會不會突破上面的檢測手段,甚至是以上幾種手段的組合。
正是考慮到了這一點,很多研究人員試圖從另一角度解決深度偽造的問題——例如創建某種數字簽名或者水印,來證明視頻通話的真實性,而不是試圖揭露深度偽造行為。
說到這里,就不得不提一個名叫“內容來源和真實性聯合計劃”(Coalition for Content Provenance and Authentication,簡稱C2PA)的機構,它是一個致力于建立數字媒體認證標準的基金會,該基金會得到了微軟(Microsoft)、Adobe、索尼(Sony)和推特(Twitter)等公司的支持。法里德說:“我認為內容來源和真實性聯合計劃應該重視這個問題,他們已經為視頻錄制建立了規范,將它拓展到實時視頻通話也是一件很自然的事情。”但法里德同時也承認,實視視頻數據的驗證并非一項簡單的技術挑戰。“我現在還不知道應該怎么做,但它是一個值得思考的問題。”
最后提醒大家,下次在Zoom軟件上開電話會議的時候,記得帶上一根鉛筆。(財富中文網)
譯者:樸成奎
下次你在Zoom上打視頻電話的時候,你可以讓對方把手指放在鼻子旁邊,或者讓對方將側臉對著鏡頭保持一分鐘。
這些都是專家推薦的方法,以確保你的聊天對象是真人,而不是用深度偽造(Deepfake)技術生成的假形象。
這種防范措施雖然顯得有些莫名其妙,但是我們本來就生活在一個奇怪的年代。
今年8月,加密貨幣交易所幣安(Binance)的一名高管表示,曾經有騙子利用深度偽造技術假冒他的形象,用來對數個虛擬幣項目實施電信詐騙。幣安的溝通總監帕特里克·希爾曼稱,曾經有詐騙分子在Zoom視頻電話上假冒他的形象。(希爾曼并未提供相關證據證實他的說法,一些專家對此表示懷疑,但網絡安全研究人員表示,這類事件是有可能發生的。)美國聯邦調查局(FBI)在今年7月曾經警告道,有人可能會在網絡求職遭遇時對方使用深度造假技術詐騙。一個月前,歐洲的幾位市長表示,他們也被假冒的烏克蘭總統弗拉基米爾·澤倫斯基騙了。更離譜的是,美國的一家名叫Metaphysic的初創公司開發了一款深度偽造軟件,它在電視真人秀《美國達人秀》(America’s Got Talent)的決賽中,直接在觀眾面前,將幾名歌手的臉無縫切換成了西蒙·考威爾等幾位明星評委的臉,讓所有人驚掉下巴。
深度偽造,是指利用使用人工智能技術,創建極具說服力的虛假圖像和視頻。以前要創假這樣一個虛假形象,需要目標對象的大量照片,還需要很多時間和相當高超的編程和特效技術。即便假期臉被生成出來了,以前的AI模型的響應速度也不夠快,無法實時生成視頻直播級的完美假臉。
然而從幣安和《美國達人秀》的例子能夠看出,現在的情況已經不同了,人們在實時視頻傳輸中使用深度偽造技術已經越來越容易了,而且此類軟件現在也是唾手可得,很多還是免費的,用起來也沒有什么技術門檻。這也為各種各樣的電信詐騙甚至政治謠言提供了可能。
加州大學伯克利分校(University of California at Berkeley)的計算機科學家哈尼·法里德是視頻分析和認證領域的專家。他感嘆道:“我對現在實時深度偽技術造的速度和質量感到驚訝。”他表示,現在至少有三種不同的開源程序可以讓人們制作實時深度造假視頻。
法里德等專家都擔心深度偽造技術會使電信詐騙發展到一個新高度。“這簡直就像給網絡釣魚詐騙打了興奮劑。”他說。
識別深度造假的小技巧
好在專家表示,目前還是有很多小技巧能夠幫助你拆穿騙子的畫皮。最可靠也最簡單的方法,就是讓對方側過臉去,讓鏡頭捕捉他的完整側臉。深度偽造技術目前還無法保證側面不露破綻,最主要的原因就是很難獲取足夠多的側面照片來訓練AI模型。雖然有一些方法可以通過正面圖像推導出側面形象,但這會大大增加生成圖像過程的復雜性。
另外,深度偽造軟件還利用了人臉上的“錨點”,來將深度偽造的“面具”匹配到人臉上。所以只需要讓對方轉頭90度,就會導致一半的錨點不可見,這通常就會導致圖像扭曲、模糊、變形,非常容易注意到。
位于以色列的本古里安大學(Ben-Gurion University)進攻性人工智能實驗室(Offensive AI Lab)的負責人伊斯羅爾·米爾斯基還通過試驗發現了很多能夠檢測出深度偽造的方法。比如在視頻通話過程中要求人們隨便拿一個東西在臉前劃過,讓某個東西在他面前反彈一下,讓他們整理一下自己的襯衫,摸一下頭發,或者用手遮擋半張臉。以上每一種辦法,都會導致深度造假軟件無法描繪多出來的物體,或者導致人臉嚴重失真。對于音頻深度造假,米爾斯基建議你可以要求對方吹口哨,或者換一種口音說話,或者隨機挑一首曲子讓對方哼唱。
米爾斯基指出:“目前所有現有的深度偽造技術都采用了非常類似的協議。它們雖然接受了大量數據的訓練,但其模式是非常特定的。多數軟件只能模仿人的正臉,而處理不好側面或者遮擋臉部的物體。”
法里德也展示了一種檢測深度偽造的方法,那就是用一個簡單的軟件程序,讓對方的電腦屏幕以某種模式閃爍,讓電腦屏幕在對方臉上投射某種模式的光線。深度偽造技術要么無法將燈光效果展示在模擬圖像中,要么反應速度太慢。法里德表示,只要讓對方使用另一個光源——例如手機的手電筒,從另一角度照亮他們的臉,就可以達到類似的檢測效果。
米爾斯基表示,要真實地模擬某個人做一些不尋常的事情,人工智能軟件就需要看到幾千個某人做這種事情的例子。但收集這么多的數據是很困難的。即便你成功訓練AI軟件完成了這些有挑戰性的任務——比如拿起一根鉛筆,從臉上劃過,且不露破綻,那么只要你要求對方拿另一個東西代替鉛筆(例如一個杯子),那么AI軟件還是會失敗。而且一般的詐騙分子也不太可能把假臉做到能夠攻克“鉛筆測試”和“側臉測試”的地步。每個不同的任務都會增加AI模型訓練的復雜性。“你希望深度偽造軟件完善的方面是有限的。”米爾斯基說。
深度偽造技術也在日益進步
目前,很少有安全專家建議大家在視頻通話前先驗證身份——就像登陸很多網站要先填驗證碼一樣。不過米爾斯基和法里德都認為,在一些重要場合,視頻通話前先“驗明正身”是有必要的,比如政治領導人之間的對話,或者有可能導致高額金額交易的對話。另外我們尤其要警惕一些反常的情形,例如陌生號碼打來的電話,又或者人們的一些反常行為和要求。
法里德建議,對于一些非常重要的電話,你也可以使用簡單的雙因素認證,比如你能夠同時給對方發條短信,問問他是不是正在跟你視頻通話。
專家強調,深度偽造技術一直在進步,誰也不能保證將來它們會不會突破上面的檢測手段,甚至是以上幾種手段的組合。
正是考慮到了這一點,很多研究人員試圖從另一角度解決深度偽造的問題——例如創建某種數字簽名或者水印,來證明視頻通話的真實性,而不是試圖揭露深度偽造行為。
說到這里,就不得不提一個名叫“內容來源和真實性聯合計劃”(Coalition for Content Provenance and Authentication,簡稱C2PA)的機構,它是一個致力于建立數字媒體認證標準的基金會,該基金會得到了微軟(Microsoft)、Adobe、索尼(Sony)和推特(Twitter)等公司的支持。法里德說:“我認為內容來源和真實性聯合計劃應該重視這個問題,他們已經為視頻錄制建立了規范,將它拓展到實時視頻通話也是一件很自然的事情。”但法里德同時也承認,實視視頻數據的驗證并非一項簡單的技術挑戰。“我現在還不知道應該怎么做,但它是一個值得思考的問題。”
最后提醒大家,下次在Zoom軟件上開電話會議的時候,記得帶上一根鉛筆。(財富中文網)
譯者:樸成奎
The next time you get on a Zoom call, you might want to ask the person you’re speaking with to push their finger into the side of their nose. Or maybe turn in complete profile to the camera for a minute.
Those are just some of the methods experts have recommended as ways to provide assurance that you are seeing a real image of the person you are speaking to and not an impersonation created with deepfake technology.
It sounds like a strange precaution, but we live in strange times.
In August, a top executive of the cryptocurrency exchange Binance said that fraudsters had used a sophisticated deepfake “hologram” of him to scam several cryptocurrency projects. Patrick Hillmann, Binance’s chief communications officer, says criminals had used the deepfake to impersonate him on Zoom calls. (Hillmann has not provided evidence to support his claim and some experts are skeptical a deepfake was used. Nonetheless, security researchers say that such incidents are now plausible.) In July, the FBI warned that people could use deepfakes in job interviews conducted over video conferencing software. A month earlier, several European mayors said they were initially fooled by a deepfake video call purporting to be with Ukrainian President Volodymyr Zelensky. Meanwhile, a startup called Metaphysic that develops deepfake software has made it to the finals of “America’s Got Talent,” by creating remarkably good deepfakes of Simon Cowell and the other celebrity judges, transforming other singers into the celebs in real-time, right before the audience’s eyes.
Deepfakes are extremely convincing fake images and videos created through the use of artificial intelligence. It once required a lot of images of someone, a lot of time, and a fair-degree of both coding skill and special effects know-how to create a believable deepfake. And even once created, the A.I. model couldn’t be run fast enough to produce a deepfake in real-time on a live video transmission.
That’s no longer the case, as both the Binance story and Metaphysics “America’s Got Talent” act highlight. In fact, it’s becoming increasingly easy for people to use deepfake software to impersonate others in live video transmissions. Software allowing someone to do this is now readily available, for free, and requires relatively little technical skill to use. And as the Binance story also shows, this opens the possibility for all kinds of fraud—and political disinformation.
“I am surprised by how fast live deepfakes have come and how good they are,” Hany Farid, a computer scientist at the University of California at Berkeley who is an expert in video analysis and authentication, says. He says there are at least three different open source programs that allow people to create live deepfakes.
Farid is among those who worry that live deepfakes could supercharge fraud. “This is going to be like phishing scams on steroids,” he says.
The “pencil test” and other tricks to catch an A.I. impostor
Luckily, experts say there are still a number of techniques a person can use to give themselves a reasonable assurance that they are not communicating with a deepfake impersonation. One of the most reliable is simply to ask a person to turn so that the camera is capturing her in complete profile. Deepfakes struggle with profiles for a number of reasons. For most people, there aren’t enough profile images available to train a deepfake model to reliably reproduce the angle. And while there are ways to use computer software to estimate a profile view from a front-facing image, using this software adds complexity to the process of creating the deepfake.
Deepfake software also uses “anchor points” on a person’s face to properly position the deepfake “mask” on top of it. Turning 90 degrees eliminates half of the anchor points, which often results in the software warping, blurring, or distorting the profile image in strange ways that are very noticeable.
Yisroel Mirsky, a researcher who heads the Offensive AI Lab at Israel’s Ben-Gurion University, has experimented with a number of other methods for detecting live deepfakes that he has compared to the CAPTCHA system used by many websites to detect software bots (you know, the one that asks you to pick out all the images of traffic lights in a photo broken up into squares). His techniques include asking people on a video call to pick up a random object and move it across their face, to bounce an object, to lift up and fold their shirt, to stroke their hair, or to mask part of their face with their hand. In each case, either the deepfake will fail to depict the object being passed in front of the face or the method will cause serious distortion to the facial image. For audio deepfakes, Mirsky suggests asking the person to whistle, or to try to speak with an unusual accent, or to hum or sing a tune chosen at random.
“All of today’s existing deepfake technologies follow a very similar protocol,” Mirsky says. “They are trained on lots and lots of data and that data has to have a particular pattern you are teaching the model.” Most A.I. software is taught to just reliably mimic a person’s face seen from the front and can’t handle oblique angles or objects that occlude the face well.
Meanwhile, Farid has shown that another way to detect possible deepfakes is to use a simple software program that causes the other person’s computer screen to flicker in a certain pattern or causes it to project a light pattern onto the face of the person using the computer. Either the deepfake will fail to transfer the lighting effect to the impersonation or it will be too slow to do so. A similar detection might be possible just by asking someone to use another light source, such as a smartphone flashlight, to illuminate their face from a different angle, Farid says.
To realistically impersonate someone doing something unusual, Mirsky says that the AI software needs to have seen thousands of examples of people doing that thing. But collecting a data set like that is difficult. And even if you could train the A.I. to reliably impersonate someone doing one of these challenging tasks—like picking up a pencil and passing it in front of their face—the deepfake is still likely to fail if you ask the person to use a very different kind of object, like a mug. And attackers using deepfakes are also unlikely to have been able to train a deepfake to overcome multiple challenges, like both the pencil test and the profile test. Each different task, Mirsky says, increases the complexity of the training the A.I. requires. “You are limited in the aspects you want the deepfake software to perfect,” he says.
Deepfakes are getting better all the time
For now, few security experts are suggesting that people will need to use these CAPTCHA-like challenges for every Zoom meeting they take. But Mirsky and Farid both said that people might be wise to use them in high-stakes situations, such as a call between political leaders, or a meeting that might result in a high-value financial transaction. And both Farid and Mirsky urged people to be attuned to other possible red flags, such as audio calls from unfamiliar numbers or people behaving strangely or making unusual requests.
Farid says that for very important calls, people might use some kind simple two-factor authentication, such as sending a text message to a mobile number you know to be the correct one for that person, asking if they are on a video call right now with you.
The researchers also emphasized that deepfakes are getting better all the time and that there is no guarantee that it won’t become much easier for them to evade any particular challenge—or even combinations of them—in the future.
That’s also why many researchers are trying to address the problem of live deepfakes from the opposite perspective—creating some sort of digital signature or watermark that would prove that a video call is authentic, rather than trying to uncover a deepfake.
One group that might work on a protocol for verifying live video calls is the Coalition for Content Provenance and Authentication (C2PA)—a foundation dedicated to digital media authentication standards that’s backed by companies including Microsoft, Adobe, Sony, Twitter. “I think the C2PA should pick this up because they have built specification for recorded video and extending it for live video is a natural thing,” Farid says. But Farid admits that trying to authenticate data that is being streamed in real-time is not an easy technological challenge. “I don’t see immediately how to do it, but it will interesting to think about,” he says.
In the meantime, remind the guests on your next Zoom call to bring a pencil to the meeting.