AI不那么美妙的一面:他們在聽你跟Alexa說了什么
人們可能想不到,自己向亞馬遜語音助手Alexa詢問身上的奇怪皮疹是怎么回事,或者要它關上燈時,還有其他人在聽。 要讓人工智能變得更聰明,就需要人的輸入以及復核。上個月,彭博報道曝光了一個遍及全球的團隊,他們的任務是聆聽毫無防備的用戶向Alexa提出的問題,而且這個AI訓練團隊有數千人之多。 這些員工聽的錄音包括人們要Alexa關燈,或者播放泰勒·斯威夫特的歌曲。他們把這些話整理出來,再重新輸入Alexa的軟件中,讓它變得更聰明并且更善于掌握人們說話的方式。 平臺Twilio Autopilot供開發者制作機器人程序和Alexa應用,該機構產品和工程部門負責人尼可·阿科斯塔說:“用這種方式訓練很正常,而且這是AI不那么美妙的一面。所有語音引擎都需要用真實世界的聲音來訓練,也就是說,需要有人把這些聲音整理出來,以便不間斷地訓練這種引擎。” 把這樣的智能音箱放在家里顯然要權衡隱私問題。亞馬遜發言人在給《財富》雜志的聲明中表示,該公司使用了“隨機挑選的一批用戶中極小的一部分互動內容”,聽錄音的亞馬遜員工無法借此來辨別用戶的身份。 該發言人指出:“比如說,這些信息幫助我們訓練我們的語音識別和自然語言解讀系統,這樣Alexa就能更好地理解您的要求,并確保它的服務對任何人都很有效。我們有嚴格的技術和操作防范措施,而且對濫用我們這個系統的行為采取零容忍政策。” 網絡安全公司Forcepoint首席科學家理查德·福特認為:“原始人類訓練數據對保持上述服務的質量來說很‘關鍵’。” 福特說:“如果想對Alexa進行語音識別訓練,最好的訓練數據就是實際‘使用中’的情境,其中有背景噪音,有狗叫,有人們在進行交流……也就是大家能在真實世界中見到的所有‘亂糟糟的東西’。” 但他指出,Alexa也有其他訓練途徑,并不需要偷聽數千萬人對Alexa的要求。 福特說:“你可以付錢,讓人們選擇自愿分享數據,或者參加測試。但說到底,用容易操作的方式獲得真正的現實數據可能需要捕捉真實世界數據。也許可以采取一些緩解措施來盡量降低隱私風險,但它們并非萬無一失。隱私是把好的治理、好的設計以及好的實施融合在一起。” 此前已有人擔心把大型科技公司的智能音箱放在家里存在隱私問題,這件事更是加重了他們的顧慮。不過,亞馬遜表示Alexa只錄下了用戶的要求,并在聽到“Alexa”或“亞馬遜”等喚醒詞語后把錄音發送到了云端。亞馬遜Echo音箱錄音時的特征很明顯,那就是它頂端的藍色光圈會亮起來。 以前的錄音可以刪除。用戶可以在Amazon Connect and Devices網站上手動刪除自己對Alexa說的所有內容。他們可以在該網站上選擇“設備”,也就是亞馬遜Echo音箱,然后點擊“管理語音錄音”。 如果根本就不想做“被蒙在鼓里的”AI訓練師,那就可以在亞馬遜Alexa App上點擊左上角的菜單按鈕,然后選擇“Alexa賬號”和“Alexa隱私”。再點擊“管理您的數據如何改善Alexa”,然后關閉“幫助開發新功能”和“用短信來改善對語音的整理”選項。這樣亞馬遜就無法用原始錄音來訓練它的軟件了。 當然,如果選擇隱私的人過多,提高AI的自然語言理解能力所花費的時間就會變得長得多。福特說:“在不使用真實數據的情況下構建這樣的語言素材庫真的很難,正因為這樣,才會出現從實際使用中收集數據的真切需求。要想按時并且高效地交付產品,這會成為一個很大的難題。”(財富中文網) 作者:Alyssa Newcomb 譯者:Charlie 審校:夏林 |
When users ask Alexa about their mysterious rash, or to turn off the lights, they might not expect someone else to be listening. A.I. needs human input—and human reviewers—to become smarter. This week, a Bloomberg report pulled back the curtain on the team of people around the world who are tasked with listening to the Alexa queries of unsuspecting users. And the A.I. training team’s members number in the thousands. The employees listen to recordings of people asking for Alexa to turn off the lights or play Taylor Swift. They transcribe the queries and feed them back to the Alexa software, making it smarter and more adept at grasping the way humans speak. “It is normal to train this way, and a less sexy side of A.I.,” said Nico Acosta, director of product and engineering at Twilio Autopilot, a platform that allows developers to build bots and Alexa apps. “All speech engines need to be trained on real world audio, which implies the need to have a human transcribe it to continuously train the engine.” There’s a clear privacy trade-off in having these smart speakers in your home. In a statement to Fortune, an Amazon spokesperson said the company uses “an extremely small number of interactions from a random set of customers,” who are not identifiable to the employees who are listening. “For example, this information helps us train our speech recognition and natural language understanding systems, so Alexa can better understand your requests, and ensure the service works well for everyone,” the spokesperson said. “We have strict technical and operational safeguards, and have a zero tolerance policy for the abuse of our system.” Raw human training data is “critical” when it comes to keeping the quality of the service, said Richard Ford, chief scientist at cybersecurity firm Forcepoint. “If you want to do voice recognition for Alexa, the best data to train it on is on actual ‘as used’ scenarios, where there’s background noise, dogs barking, people changing their minds… all the ‘mess’ that you find in the real world,” said Ford. However, there are other ways Amazon could train Alexa without eavesdropping on tens of millions of queries, he said. “You could pay people to opt in to share their data willingly, or take part in trials, but at the end of the day, getting truly realistic data in a tractable way probably involves capturing real world data,” he said. “There are mitigations you can potentially put in place to minimize the privacy risks, but they are not infallible. Privacy is a confluence of good governance, good design, and good implementation.” While the story has added to the concerns of people who are already worried about the privacy issues involved with allowing a tech giant’s smart speaker to live in their home, Amazon said its speaker only records queries and sends them to the cloud after it hears its wake word, such as “Alexa” or “Amazon.” A clear sign that the Echo speaker is recording: the device’s blue ring lights up. There are ways to get rid of old recordings. Users can manually delete everything they’ve ever asked Amazon Alexa by visiting the Amazon Connect and Devices website. Once there, select “devices,” the Amazon Echo, and then “manage voice recordings.” To opt out of being an unwitting A.I. trainer altogether, in the Amazon Alexa app, tap the menu button in the upper left corner of the screen. Then select “Alexa Account” and “Alexa Privacy.” Choose “Manage how your data improves Alexa.” Next, click off the buttons next to “Help Develop New Features” and “Use Messages to Improve Transcriptions.” The settings will keep Amazon from using raw recordings to train its software. Of course, if too many people opt for privacy, the A.I. will take a lot longer to improve its understanding of natural language. “Getting such a corpus is really hard without using real data, which is why there’s often a genuine need to collect data from actual usage,” said Ford. “If you’re going to deliver your product on time and with a high efficacy, it’s a hard problem.” |