AIは人間ほど物体を認識できない―その理由と解決法(AI can’t see as well as humans, and how to fix it)

2025-07-22 スイス連邦工科大学ローザンヌ校(EPFL)

EPFLの研究により、AIは人間のように「輪郭統合」を使って物体を認識できないことが判明した。実験では、輪郭の35%のみが残された画像に対し、人間は約50%の正答率を維持したが、AIはほぼランダムな結果となった。これはAIが局所的な特徴に依存し、全体の形を認識する能力に欠けるためである。研究チームは、人間のような視覚バイアスをAIに学習させる手法を導入し、輪郭に基づく認識能力の向上を実現。自動運転や医療画像解析など、安全性が重視される応用分野でのAIの信頼性向上に貢献すると期待される。

<関連情報>

輪郭の統合が人間のような視覚を支える Contour Integration Underlies Human-Like Vision

Ben Lonnqvist, Elsa Scialom, Abdulkadir Gokce, Zehra Merchant, Michael H. Herzog, Martin Schrimpf
arXiv  last revised 19 Jun 2025 (this version, v2)
DOI:https://doi.org/10.48550/arXiv.2504.05253

AIは人間ほど物体を認識できない―その理由と解決法(AI can’t see as well as humans, and how to fix it)

Abstract

Despite the tremendous success of deep learning in computer vision, models still fall behind humans in generalizing to new input distributions. Existing benchmarks do not investigate the specific failure points of models by analyzing performance under many controlled conditions. Our study systematically dissects where and why models struggle with contour integration – a hallmark of human vision – by designing an experiment that tests object recognition under various levels of object fragmentation. Humans (n=50) perform at high accuracy, even with few object contours present. This is in contrast to models which exhibit substantially lower sensitivity to increasing object contours, with most of the over 1,000 models we tested barely performing above chance. Only at very large scales (∼ 5B training dataset size) do models begin to approach human performance. Importantly, humans exhibit an integration bias – a preference towards recognizing objects made up of directional fragments over directionless fragments. We find that not only do models that share this property perform better at our task, but that this bias also increases with model training dataset size, and training models to exhibit contour integration leads to high shape bias. Taken together, our results suggest that contour integration is a hallmark of object vision that underlies object recognition performance, and may be a mechanism learned from data at scale.

1601コンピュータ工学
ad
ad
Follow
ad
タイトルとURLをコピーしました