ロボットが重要なオブジェクトを認識できるようにする(Helping robots zero in on the objects that matter)

ad

2024-09-30 マサチューセッツ工科大学(MIT)

MITの研究者は、Clioという新しい手法を開発し、ロボットが与えられたタスクに基づいて必要なオブジェクトを迅速に特定できるようにしました。Clioは、シーンをリアルタイムでマッピングし、タスクに関連するオブジェクトだけを抽出してロボットが認識し、操作することを可能にします。この技術は、救助活動や工場での作業、家庭用ロボットにも応用でき、複雑な環境での効率的なタスク完了を支援します。

<関連情報>

クリオ: リアルタイムタスク駆動型オープンセット3Dシーングラフ Clio: Real-Time Task-Driven Open-Set 3D Scene Graphs

Dominic Maggio; Yun Chang; Nathan Hughes;…
IEEE Robotics and Automation Letters  Published:29 August 2024
DOI:https://doi.org/10.1109/LRA.2024.3451395

ロボットが重要なオブジェクトを認識できるようにする(Helping robots zero in on the objects that matter)

Abstract:

Modern tools for class-agnostic image segmentation (e.g., SegmentAnything) and open-set semantic understanding (e.g., CLIP) provide unprecedented opportunities for robot perception and mapping. While traditional closed-set metric-semantic maps were restricted to tens or hundreds of semantic classes, we can now build maps with a plethora of objects and countless semantic variations. This leaves us with a fundamental question: what is the right granularity for the objects (and, more generally, for the semantic concepts) the robot has to include in its map representation? While related work implicitly chooses a level of granularity by tuning thresholds for object detection, we argue that such a choice is intrinsically task-dependent. The first contribution of this paper is to propose a task-driven 3D scene understanding problem, where the robot is given a list of tasks in natural language, and has to select the granularity and the subset of objects and scene structure to retain in its map that is sufficient to complete the tasks. We show that this problem can be naturally formulated using the Information Bottleneck (IB), an established information-theoretic framework to discuss task-relevance. The second contribution is an algorithm for task-driven 3D scene understanding based on an Agglomerative IB approach, that is able to cluster 3D primitives in the environment into task-relevant objects and regions. The third contribution is to integrate our task-driven clustering algorithm into a real-time pipeline, named Clio , that constructs a hierarchical 3D scene graph of the environment online and using only onboard compute. Our final contribution is an extensive experimental campaign showing that Clio not only allows real-time construction of compact open-set 3D scene graphs, but also improves the accuracy of task execution by limiting the map to relevant semantic concepts.

0109ロボット
ad
ad
Follow
ad
タイトルとURLをコピーしました