AIが「わからない」と答えることを学ぶ技術を開発（Teaching AI to admit uncertainty）

2025-06-26 ジョンズ・ホプキンス大学（JHU）

ジョンズ・ホプキンズ大学の研究により、AIが不確実なときに「わからない」と答える能力を高める新手法が開発された。誤答のコストを設定した3種の“オッズ”（試験、クイズ番組、高リスク）を用いて、大規模言語モデルに確信度に応じた応答判断を学習させた。特に高リスク環境では、誤答を避ける「沈黙」が増え、信頼性向上に寄与。医療や法務などへのAI応用に重要な示唆を与える。

＜関連情報＞

それが最終回答か？テスト時間のスケーリングが選択的質問応答を向上させる Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering

William Jurayj, Jeffrey Cheng, Benjamin Van Durme
arXiv Submitted on 19 Feb 2025
DOI:https://doi.org/10.48550/arXiv.2502.13962

Abstract

Scaling the test-time compute of large language models has demonstrated impressive performance on reasoning benchmarks. However, existing evaluations of test-time scaling make the strong assumption that a reasoning system should always give an answer to any question provided. This overlooks concerns about whether a model is confident in its answer, and whether it is appropriate to always provide a response. To address these concerns, we extract confidence scores during reasoning for thresholding model responses. We find that increasing compute budget at inference time not only helps models answer more questions correctly, but also increases confidence in correct responses. We then extend the current paradigm of zero-risk responses during evaluation by considering settings with non-zero levels of response risk, and suggest a recipe for reporting evaluations under these settings.

月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30