AIの画像認識を操作する新たな攻撃手法を開発(New Attack Can Make AI ‘See’ Whatever You Want)

ad

2025-07-01 ノースカロライナ州立大学(NCState)

AIの画像認識を操作する新たな攻撃手法を開発(New Attack Can Make AI ‘See’ Whatever You Want)Photo credit: Kevin Ku.

ノースカロライナ州立大学の研究チームは、AI画像認識システムを標的にした新たな攻撃手法「RisingAttacK」を開発しました。この手法は、AIが画像内の対象(例:車、歩行者、標識など)を誤認するように、わずかな画像改変を施すことで、AIの“視覚”を自在に操作可能にします。従来より少ない変更で高精度に誤認を引き起こせるのが特徴で、ResNet-50など主要な4モデルで有効性を確認。今後は他のAI分野(例:LLM)への拡張や防御策の開発が目指されています。

<関連情報>

敵対的摂動は、敵対的ヤコビアンの右特異ベクトルの線形結合を反復的に学習することによって形成される Adversarial Perturbations Are Formed by Iteratively Learning Linear Combinations of the Right Singular Vectors of the Adversarial Jacobian

Thomas Paniagua · Chinmay Savadikar · Tianfu Wu
International Conference of Machine Learning  July 15, 2025

Abstract

Abstract: White-box targeted adversarial attacks reveal core vulnerabilities in Deep Neural Networks (DNNs), yet two key challenges persist: (i) How many target classes can be attacked simultaneously in a specified order, known as the *ordered top-K attack* problem (K≥1)? (ii) How to compute the corresponding adversarial perturbations for a given benign image directly in the image space?We address both by showing that *ordered top-K perturbations can be learned via iteratively optimizing linear combinations of the right singularri_ght sing_ular vectors of the adversarial Jacobian* (i.e., the logit-to-image Jacobian constrained by target ranking). These vectors span an orthogonal, informative subspace in the image domain.We introduce **RisingAttacK**, a novel Sequential Quadratic Programming (SQP)-based method that exploits this structure. We propose a holistic figure-of-merits (FoM) metric combining attack success rates (ASRs) andp-norms (p=1,2,∞).Extensive experiments on ImageNet-1k across six ordered top-K levels (K=1,5,10,15,20,25,30) and four models (ResNet-50, DenseNet-121, ViT-B, DEiT-B) show RisingAttacK consistently surpasses the state-of-the-art QuadAttacK.

Lay Summary:

Deep neural networks (DNNs) are highly accurate but remain vulnerable to adversarial attacks—small, often imperceptible changes to input images that cause incorrect outputs. While most attacks focus on altering the top-1 prediction, many real-world systems (e.g., search engines, medical triage) rely on the entire ranked list of outputs. This raises a key question: how can we trick a DNN to produce an ordered set of incorrect predictions?We address this with RisingAttacK, a novel method that directly learns adversarial perturbations in image space. Using Sequential Quadratic Programming, it optimizes minimal, interpretable changes that manipulate the model’s top-K ranking. The attack leverages linear combinations of the most sensitive directions—derived from the adversarial Jacobian—to efficiently disrupt the model’s output ordering.RisingAttacK consistently outperforms prior state-of-the-art attacks across four major models and ranking depths (K = 1 to 30), achieving higher success rates and lower perturbation norms.By enabling precise manipulation of ranked outputs, our method delivers the kind of comprehensive stress tests increasingly demanded by regulators and practitioners—tests that top-1-only attacks simply cannot provide.

1603情報システム・データ工学
ad
ad
Follow
ad
タイトルとURLをコピーしました