界面設計決定語音交互技術大戰成敗
????最近,蘋果(Apple)的語音控制功能Siri和谷歌(Google)的語音搜索功能Voice Search被許多人拿來比較。而微軟(Microsoft)語音技術上實現的突破同樣也登上了報紙頭條。經過幾十年來的不懈研發、屢敗屢戰后,隨著語音界面出現在移動設備上,這個領域的競爭開始逐漸升溫。這場較量將會決定廣大市場上最終成型的語音界面的形式。 ????已經有歌曲來打趣Siri的有限功能,還有網站專門分享Siri面對簡單指令做出的滑稽回應。最終的優勝者決出前,我們還有很長的一段路要走。如果想要開發出這項服務,讓大多數人心甘情愿地在日常生活中接納它,那些大公司需要在人類對語音技術的挑戰中贏得主動,方能取得成功。 ????任何高級的語音服務除去強大的語音識別與表達能力之外,還需要具備簡約迷人的界面來增強用戶體驗,需要情境感知能力來增加它的理解深度,需要靈敏快速的后端來持續了解用戶的意圖。至今為止,現有的服務沒有哪項能達到這些要求。 ????如果這種類型的語音助理確實存在,那么便有可能讓語音交互從小眾變為主流。原因如下。 “人”成為要素 ????大多數人都愛說話,不過如果面對的是機器,許多人都會有些忐忑。給最外向的人一臺麥克風,他們也會變得緘默。看看那些初次嘗試語音服務的人,那種感覺似乎并不輕松自然。 ????為什么人類不愛和機器說話?反饋(或者說缺少反饋)是一個主要原因。我們在和他人說話時,能在互動中看到許多層面的反饋——面部表情、肢體語言、語音語調等等。這種實時的反饋是人類交流的核心,在交流中,發言者和傾聽者都積極地扮演著自己的角色。而在語音服務中,大多數反饋都被剝離了。 ????盡管語音服務在電腦程序中很普遍,但它們仍然沒有在早些時候流行起來的另一個原因在于,這類服務在電腦上并不像在移動設備上那么必要。使用電腦時,人們的雙手忙于敲打鍵盤,使用英文鍵盤輸入非常迅速,敲打時也可以閱讀文本,從而保證錄入的正確率。在這種情況下,語音輸入和輸出沒有什么價值。智能手機則成為語音服務發展的轉折點。如果能直接同設備進行互動,尋找信息或解決問題的話,人們的手就能從不斷地滑動中解脫出來,干其他事。 |
????Recently there's been a lot of comparison between Apple's Siri and Google's Voice Search. Microsoft's voice breakthroughs have also captured headlines. After decades of research and false starts, the competition in voice interfaces is now heating up thanks to its appearance on mobile devices, and the race is on to shape the definitive voice interface for the mass market. ????But if the ballads to Siri's limitations or sites dedicated to her often-hilarious interpretations of simple instructions are any indication, we still have a long way to go until a winner is declared. To succeed, the big players will need to conquer the human challenges to voice tech if they want to design a service that most people will happily incorporate into their daily routines. ????For any advanced voice service, in addition to great voice recognition and interpretation, you need a compelling and simple interface that feels personal, context awareness that adds depth, and a very clever and fast backend that continuously learns the user's intent. No one service is the ultimate answer – yet. ????If this type of voice assistant did exist, it would have the potential to make voice interaction go from niche to mainstream. Here's why. The human component ????Most people like to talk. But when faced with talking to machines, most of us get intimidated. Hand the biggest extrovert a microphone and they tend to clam up. Or just observe someone trying out a voice service for the first time. It simply doesn't feel (or look) easy or natural. ????So why don't people like to talk to machines? Feedback (or the lack of it) is a big reason. When talking with another person, there are rich layers of feedback throughout the interaction – facial expressions, body language, tone of voice, and more. Constant real-time feedback is central in human communication, and both speaker and listener are active participants in the communication. With voice services, most of this feedback and interaction is stripped out. ????Another reason that technical voice services failed to catch on earlier, even though they were common in computer programs, is that there's simply less need to use voice on computers compared to mobile devices. When using a computer, your hands are already committed, the QWERTY text input is pretty efficient, and seeing text as you type it also confirms it's correct. Voice input or output adds little value there. Smartphones offer the turning point for voice. When you're on the move, chances are high that your hands could do some other useful things if you can use speech to interact with your mobile in order to find things or get stuff done. ???? |