2026-05-21 マックス・プランク研究所

How do people perceive the difference between real and computer-generated voices?© Illustration: MPIEA / L. Bittner
<関連情報>
- https://www.mpg.de/26530630/human-or-machine
- https://www.sciencedirect.com/science/article/pii/S0167639326000464
人間らしさの認識は、言葉の内容によって影響を受ける Perception of humanness is affected by speech content
Janniek Wester, Pauline Larrouy-Maestri
Speech Communication Available online: 16 April 2026
DOI:https://doi.org/10.1016/j.specom.2026.103398
Highlights
- Computer generated voices still sound less human than human voices.
- Humanness perception of voices is modulated by the meaning and structure of speech.
- Speech content does not affect the humanness perception of non-native listeners.
- Prosody and summary acoustics of synthetic voices are different from human ones.
- Older adults perceive synthetic voices as sounding more human than younger adults.
Abstract
The increasing use of computer-generated speech in various applications has raised questions about how people perceive synthetic voices. This study investigates the role of linguistic information in the perception of humanness in speech. We conducted two experiments with native German-, Spanish- and Turkish-speaking participants who rated the human-likeness of human and text-to-speech (TTS)-generated voices. By presenting German sentences as well as manipulated versions of those sentences in terms of syntax and semantics, we examined the role of three types of linguistic information, that is, phonetics, semantics, and syntax, on humanness perception. Acoustic analyses revealed differences between human and TTS-generated voices in terms of summary acoustics and dynamic contours of pitch and intensity, thus showing that TTS-generated voices are not yet fully aligned with human voices on voice quality and prosody. Importantly, behavioral results showed that these acoustic differences were more salient to native German listeners, who distinguished between human and synthetic voices more extremely. In addition to the role of phonetic or phonological familiarity, we observed a role of both syntax and semantics in humanness perception, with the manipulated sentences sounding less human regardless of the speaker (i.e., TTS-generated or human), but only for the native speakers. Lastly, humanness perception of speech appears to be relatively idiosyncratic as indicated by the individual differences observed. Altogether, this study contributes to our understanding of the interplay between linguistic and paralinguistic information in speech perception, and clarifies how listeners perceive their increasingly synthetically-generated soundscape.

/https%3A%2F%2Ffilelist.tudelft.nl%2FLR%2FActueel%2FNews%2F2026%2FQ2%2FWaterstofvliegtuig%2520Aerodelft.jpg%3Fhash%3D0496c60b05)