「1つの物体を見るAI」から「複数物体を見比べるAI」へ－単一部品にとどまらず、部品同士の幾何的関係性まで説明可能な点群言語モデルを開発－

2026-05-25 産業技術総合研究所

産総研は、複数の三次元物体を比較し、部品同士の接合関係や形状差異を自然言語で説明できる点群言語モデル「Multi-3DLLM」を開発した。従来の視覚言語モデルは単一物体の認識・説明が中心だったが、本研究では「どの部品が接合するか」「どこが異なるか」といった複数物体間の幾何的関係性を理解可能にした点が特徴である。研究チームは約7万件の質問応答付き三次元点群データセット「MO3D」を独自構築し、比較・接合・形状変化に関する学習を実施した。Multi-3DLLMは複数物体の特徴を統合し、部品レベルの幾何関係を直接捉える構造を採用している。評価実験では既存モデルを大きく上回り、物体比較タスクでは正答率33.8％を達成し、従来モデルの約1.8倍に向上した。製造現場での部品接合判断、ロボット組立支援、3D CAD編集支援などへの応用が期待される。

図1 Multi-Object in 3D Dataset（MO3D）の概要。構築したデータセットには3種類の質問応答課題があり、全7万件の三次元点群と質問応答のペアデータを含む。
※原論文の図を引用・改変したものを使用しています。

＜関連情報＞

単一オブジェクトを超えて：大規模言語モデルを用いた3D関係の学習 Beyond Single Object: Learning 3D Relations with Large Language Models

Kohsuke Ide, Ryousuke Yamada, Yue Qiu, Xianzheng Ma, Yoshihiro Fukuhara, Hirokatsu Kataoka, Yutaka Satoh
The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026

Abstract

We address a fundamental gap in 3D-LLMs: existing models focus on single-object/scene description, struggling with detailed, inter-object comparison. We propose a framework for detailed object-level reasoning across multiple objects with three components: (1) MO3D (MultiObject in 3D), an instruction dataset requiring fine-grained multi-object comparison; (2) Multi-3DLLM, using a minimal Patch-Interaction Transformer (PIT) that models inter/intra-object relationships while preserving local geometry; (3) Mini-apps, two application-driven benchmarks (Shape Mating, Change Captioning) that probe geometric understanding for practical use. Recent 3D-LLMs and 2D-VLMs perform poorly on these tasks, lacking both comparisoncentric design and geometric awareness. In contrast, Multi3DLLM trained on our mixture data learns geometric reasoning, surpasses all baselines on MO3D, and provides positive transfer to single-object classification.

月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31