さらに上手に“忘れる”AIへ ― 学習済みの知識をドメイン単位で忘却可能な世界初の新技術～不要な誤認を防ぎ、さらに信頼できるAIへ～

2025-12-02 東京理科大学,産業技術総合研究所

事前学習済み大規模視覚言語モデル（VLM）から、特定ドメインの知識だけを選択的に“忘れさせる”新手法「近似ドメインアンラーニング（ADU）」が東京理科大学と産総研により提案された。従来はサンプル単位・クラス単位の忘却が中心で、写真・絵画・クリップアートなど表現形式ごとに制御することは困難だった。研究グループは、特徴空間上でドメイン分布を分離する損失関数 Domain Disentangling Loss（DDL）と、画像ごとにドメイン特性を捉える Instance-wise Prompt Generator（InstaPG）を導入し、特定ドメインのみ認識性能を低下させることに成功した。4つの画像データセットで従来法比平均約1.6倍、難条件では約1.7倍の性能向上を達成し、AIの“知識を必要に応じて削る”新しい設計原理として、安全・高信頼なAI運用への応用が期待される。本成果はNeurIPS 2025でSpotlight採択となった。

図1：近似ドメインアンラーニング

＜関連情報＞

視覚言語モデルのための近似ドメイン反学習 Approximate Domain Unlearning for Vision-Language Models

Kodai Kawamura, Yuta Goto, Rintaro Yanagi, Hirokatsu Kataoka, Go Irie
Neural Information Processing Systems (NeurIPS 2025)

Abstract

Pre-trained Vision-Language Models (VLMs) exhibit strong generalization capabilities, enabling them to recognize a wide range of objects across diverse domains without additional training. However, they often retain irrelevant information beyond the requirements of specific target downstream tasks, raising concerns about computational efficiency and potential information leakage. This has motivated growing interest in approximate unlearning, which aims to selectively remove unnecessary knowledge while preserving overall model performance. Existing approaches to approximate unlearning have primarily focused on class unlearning, where a VLM is retrained to fail to recognize specified object classes while maintaining accuracy for others. However, merely forgetting object classes is often insufficient in practical applications. For instance, an autonomous driving system should accurately recognize real cars, while avoiding misrecognition of illustrated cars depicted in roadside advertisements as real cars, which could be hazardous. In this paper, we introduce Approximate Domain Unlearning (ADU), a novel problem setting that requires reducing recognition accuracy for images from specified domains (e.g., illustration) while preserving accuracy for other domains (e.g., real). ADU presents new technical challenges: due to the strong domain generalization capability of pre-trained VLMs, domain distributions are highly entangled in the feature space, making naive approaches based on penalizing target domains ineffective. To tackle this limitation, we propose a novel approach that explicitly disentangles domain distributions and adaptively captures instance-specific domain information. Extensive experiments on three multi-domain benchmark datasets demonstrate that our approach significantly outperforms strong baselines built upon state-of-the-art VLM tuning techniques, paving the way for practical and fine-grained unlearning in VLMs.

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31