フェイク動画検出ツールを開発(UC Riverside scientists develop tool to detect fake videos)

2025-07-18 カリフォルニア大学リバーサイド校(UCR)

カリフォルニア大学リバーサイド校の研究チームは、顔の有無に関係なく偽動画を高精度で検出できるAIモデル「UNITE」を開発。従来の顔中心の手法とは異なり、背景や動きの違和感も解析対象とし、Transformerベースの構造と独自の損失関数“attention-diversity loss”を活用して多領域の不整合を検出。CVPR 2025で発表され、顔のない映像でも最大99%の精度を達成。ディープフェイク対策に革新をもたらす。

<関連情報>

普遍的な合成ビデオ検出器に向けて: 顔や背景の操作から完全なAI生成コンテンツまで Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content

Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachandran, Amit K. Roy-Chowdhury
arXiv  Submitted on 16 Dec 2024
DOI:https://doi.org/10.48550/arXiv.2412.12278

フェイク動画検出ツールを開発(UC Riverside scientists develop tool to detect fake videos)

Abstract

Existing DeepFake detection techniques primarily focus on facial manipulations, such as face-swapping or lip-syncing. However, advancements in text-to-video (T2V) and image-to-video (I2V) generative models now allow fully AI-generated synthetic content and seamless background alterations, challenging face-centric detection methods and demanding more versatile approaches.
To address this, we introduce the \underline{U}niversal \underline{N}etwork for \underline{I}dentifying \underline{T}ampered and synth\underline{E}tic videos (\texttt{UNITE}) model, which, unlike traditional detectors, captures full-frame manipulations. \texttt{UNITE} extends detection capabilities to scenarios without faces, non-human subjects, and complex background modifications. It leverages a transformer-based architecture that processes domain-agnostic features extracted from videos via the SigLIP-So400M foundation model. Given limited datasets encompassing both facial/background alterations and T2V/I2V content, we integrate task-irrelevant data alongside standard DeepFake datasets in training. We further mitigate the model’s tendency to over-focus on faces by incorporating an attention-diversity (AD) loss, which promotes diverse spatial attention across video frames. Combining AD loss with cross-entropy improves detection performance across varied contexts. Comparative evaluations demonstrate that \texttt{UNITE} outperforms state-of-the-art detectors on datasets (in cross-data settings) featuring face/background manipulations and fully synthetic T2V/I2V videos, showcasing its adaptability and generalizable detection capabilities.

1604情報ネットワーク
ad
ad
Follow
ad
タイトルとURLをコピーしました