AIが日常音の模倣を可能にする新モデル (Teaching AI to Communicate Sounds as Humans Do)

2025-01-09 マサチューセッツ工科大学 (MIT)

マサチューセッツ工科大学（MIT）の研究チームは、人間の声帯や口腔の動きを模倣するAIモデルを開発し、日常の音を人間のように再現・理解することに成功しました。このモデルは、救急車のサイレンや猫の鳴き声など、多様な音を人間らしく模倣できます。さらに、逆に人間の声による模倣から元の音を推測することも可能です。この技術は、エンターテインメントや教育分野での新たな音声インターフェースの構築に寄与すると期待されています。

＜関連情報＞

声でスケッチ：声帯模写による音の「非フォノリアリスティック」な表現 Sketching With Your Voice: “Non-Phonorealistic” Rendering of Sounds via Vocal Imitation

Matthew Caren, Kartik Chandra, Joshua Tenenbaum, Jonathan Ragan-Kelley, Karima Ma
SA ’24: SIGGRAPH Asia 2024 Conference Papers Published: 03 December 2024
DOI:https://doi.org/10.1145/3680528.3687679

Abstract

We present a method for automatically producing human-like vocal imitations of sounds: the equivalent of “sketching,” but for auditory rather than visual representation. Starting with a simulated model of the human vocal tract, we first try generating vocal imitations by tuning the model’s control parameters to make the synthesized vocalization match the target sound in terms of perceptually-salient auditory features. Then, to better match human intuitions, we apply a cognitive theory of communication to take into account how human speakers reason strategically about their listeners. Finally, we show through several experiments and user studies that when we add this type of communicative reasoning to our method, it aligns with human intuitions better than matching auditory features alone does. This observation has broad implications for the study of depiction in computer graphics.