霊長類とAIが3D世界をどう知覚するかを比較(Study offers glimpse into how monkeys ― and machines ― see a 3D world)

2025-07-07 イェール大学

イェール大学の研究で、サルとAIが2D画像から3D構造を推定する「逆グラフィックス」処理の共通性が判明。ニューラルネットBINを用い、画像から2.5D表現を経て3Dを推定し、その段階がマカクザルの視覚野の神経活動と一致。視覚の目的が3D理解であるとの仮説を支持し、AIの画像認識や脳機能解明に貢献する可能性がある。

<関連情報>

霊長類の側頭下皮質のボディパッチにおける多領域処理は逆グラフィックスを実装している Multiarea processing in body patches of the primate inferotemporal cortex implements inverse graphics

Hakan Yilmaz, Aalap D. Shah, Ariadne Letrou, +2 , and Ilker Yildirim
Proceedings of the National Academy of Sciences Published:July 8, 2025
DOI:https://doi.org/10.1073/pnas.2420287122

Significance

Computer graphics creates abstractions of the data-generating processes in the external world, of how scenes of 3D objects form and project to images. The present study suggests that higher-level visual regions in the macaque brain, specifically the body network in the inferotemporal cortex, may have internalized an algorithm akin to the reverse of the graphics process, mapping images to 3D objects. Thus, inferring 3D objects, implemented via inverse graphics, may be a distinct computational-level objective of biological vision. This contrasts sharply with the currently dominant approaches in computational vision, which emphasize the sensitivity of biological vision to high-level image statistics and which, as we show, provide inferior accounts of neural activity.

Abstract

Stimulus-driven, multiarea processing in the inferotemporal (IT) cortex is thought to be critical for transforming sensory inputs into useful representations of the world. What are the formats of these neural representations and how are they computed across the nodes of the IT networks? A growing literature in computational neuroscience focuses on the computational-level objective of acquiring high-level image statistics that supports useful distinctions, including between object identities or categories. Here, inspired by classic theories of vision, we suggest an alternative possibility. We show that inferring 3D objects may be a distinct computational-level objective of IT, implemented via an algorithm analogous to graphics-based generative models of how 3D scenes form and project to images, but in the reverse order. Using perception of bodies as a case study, we show that inverse graphics spontaneously emerges in inference networks trained to map images to 3D objects. Remarkably, this correspondence to the reverse of a graphics-based generative model also holds across the body processing network of the macaque IT cortex. Finally, inference networks recapitulate the feedforward progression across the stages of this IT network and do so better than the currently dominant vision models, including both supervised and unsupervised variants, none of which aligns with the reverse of graphics. This work suggests inverse graphics as a multiarea neural algorithm implemented within IT, and points to ways for replicating primate vision capabilities in machines.

月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31