AIヘッドフォンが会話相手を自動認識して音声を最適化(AI headphones automatically learn who you’re talking to ― and let you hear them better)

2025-12-09 ワシントン大学(UW)

騒がしい場所で複数の会話が入り交じる――いわゆる「カクテルパーティー問題」を解決するため、University of Washington の研究チームは、AI を搭載したヘッドフォンを開発した。新しいプロトタイプは、使用者が話し始めるとAIが会話のリズム(誰がいつ話したか)を解析し、会話相手を自動で識別、背景音や他の声を抑えて相手の声だけを聞こえやすくする。“Proactive Hearing Assistants”と呼ばれ、オフ・ザ・シェルフのノイズキャンセリング機器とバイノーラルマイクを用い、わずか2〜4秒の音声で会話相手を特定可能。実験では1〜4人の会話相手をリアルタイムに分離でき、被験者の評価でもノイズ除去後の音声は素の音声の2倍以上聞き取りやすいとされた。これは、特定の話者に集中したいときだけ声を抽出するという、従来にないスマートで自律的な聴覚補助を可能にする技術で、将来的に補聴器やスマートグラスなどにも応用が期待される。

AIヘッドフォンが会話相手を自動認識して音声を最適化(AI headphones automatically learn who you’re talking to ― and let you hear them better)
The team combined off-the-shelf noise-canceling headphones with binaural microphones to create the prototype, pictured here.Hu et al./EMNLP

<関連情報>

自己中心的な会話を遮断するプロアクティブ聴覚アシスタント Proactive Hearing Assistants that Isolate Egocentric Conversations

Guilin Hu, Malek Itani, Tuochao Chen, Shyamnath Gollakota
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
DOI:https://doi.org/10.18653/v1/2025.emnlp-main.1289

Abstract

We introduce proactive hearing assistants that automatically identify and separate the wearer’s conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer’s self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement.

 

プログラム可能な音声AIアクセラレータを搭載したワイヤレスヒアラブル Wireless Hearables With Programmable Speech AI Accelerators

Malek Itani, Tuochao Chen, Arun Raghavan, Gavriel Kohlberg, Shyamnath Gollakota
arXiv  last revised 22 Oct 2025 (this version, v2)
DOI:https://doi.org/10.48550/arXiv.2503.18698

Abstract

The conventional wisdom has been that designing ultra-compact, battery-constrained wireless hearables with on-device speech AI models is challenging due to the high computational demands of streaming deep learning models. Speech AI models require continuous, real-time audio processing, imposing strict computational and I/O constraints. We present NeuralAids, a fully on-device speech AI system for wireless hearables, enabling real-time speech enhancement and denoising on compact, battery-constrained devices. Our system bridges the gap between state-of-the-art deep learning for speech enhancement and low-power AI hardware by making three key technical contributions: 1) a wireless hearable platform integrating a speech AI accelerator for efficient on-device streaming inference, 2) an optimized dual-path neural network designed for low-latency, high-quality speech enhancement, and 3) a hardware-software co-design that uses mixed-precision quantization and quantization-aware training to achieve real-time performance under strict power constraints. Our system processes 6 ms audio chunks in real-time, achieving an inference time of 5.54 ms while consuming 71.6 mW. In real-world evaluations, including a user study with 28 participants, our system outperforms prior on-device models in speech quality and noise suppression, paving the way for next-generation intelligent wireless hearables that can enhance hearing entirely on-device.

0404情報通信
ad
ad
Follow
ad
タイトルとURLをコピーしました