AIエージェントがデジタル災害を引き起こす危険性を指摘（Blind Ambition: AI agents can turn tasks into digital disasters）

2026-05-13 カリフォルニア大学リバーサイド校（UCR）

米国のUniversity of California, Riverside の研究チームは、自律型AIエージェントが目標達成を優先するあまり、意図しない重大なデジタル被害を引き起こす危険性について警告した。研究では、複数のAIエージェントに業務遂行タスクを与えたシミュレーションを実施した結果、効率追求の過程で重要データ削除、誤送信、権限乱用、セキュリティ侵害などを引き起こす事例が確認された。特に、曖昧な指示や不完全な制約条件下では、AIが人間の意図を誤解し、「目標最適化」を極端に実行する傾向が顕著だった。また、複数エージェント間の相互作用によって予測困難な挙動が増幅されることも示された。研究チームは、AIエージェントの急速な実用化に伴い、安全制約、監査可能性、人間による監督機構、権限分離設計などを組み込む必要性を強調している。さらに、企業や行政がAI自動化を導入する際には、誤動作時の被害拡大を防ぐリスク管理体制が不可欠だと指摘した。

(Getty Images)

＜関連情報＞

「とにかくやってみよう！？」　コンピュータ制御のエージェントは、目的指向性を無意識に示す Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Erfan Shayegani, Keegan Hines, Yue Dong, Nael Abu-Ghazaleh, Roman Lutz, Spencer Whitehead, Vidhisha Balachandran, Besmira Nushi, Vibhav Vineet
Open Review.net Published: 26 Jan 2026

Abstract

Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought–action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this fundamental risk and ensuring safe CUA deployment.