高品質な画像生成を9倍高速化する新AIツール(A new AI tool generates high-quality images faster)

ad

2025-03-21 マサチューセッツ工科大学(MIT)

MITとNVIDIAの研究者は、画像生成AIモデルの利点を融合させた新ツール「HART(Hybrid Autoregressive Transformer)」を開発した。これは、高速なオートレグレッシブモデルで画像の大枠を生成し、その後小型のディフュージョンモデルで細部を補正するという手法で、従来のディフュージョンモデルと同等かそれ以上の画像品質を、約9倍の速さで実現する。HARTは計算資源の消費も少なく、一般的なノートPCやスマートフォンでも実行可能。画像生成の効率性と精度を兼ね備え、ロボット訓練やゲームデザインなど幅広い応用が期待される。

<関連情報>

HART:ハイブリッド自己回帰変換器による効率的なビジュアル生成 HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Haotian Tang, Yecheng Wu, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, Song Han
arXive  Submitted on 14 Oct 2024
DOI:https://doi.org/10.48550/arXiv.2410.10812

高品質な画像生成を9倍高速化する新AIツール(A new AI tool generates high-quality images faster)

Abstract

We introduce Hybrid Autoregressive Transformer (HART), an autoregressive (AR) visual generation model capable of directly generating 1024×1024 images, rivaling diffusion models in image generation quality. Existing AR models face limitations due to the poor image reconstruction quality of their discrete tokenizers and the prohibitive training costs associated with generating 1024px images. To address these challenges, we present the hybrid tokenizer, which decomposes the continuous latents from the autoencoder into two components: discrete tokens representing the big picture and continuous tokens representing the residual components that cannot be represented by the discrete tokens. The discrete component is modeled by a scalable-resolution discrete AR model, while the continuous component is learned with a lightweight residual diffusion module with only 37M parameters. Compared with the discrete-only VAR tokenizer, our hybrid approach improves reconstruction FID from 2.11 to 0.30 on MJHQ-30K, leading to a 31% generation FID improvement from 7.85 to 5.38. HART also outperforms state-of-the-art diffusion models in both FID and CLIP score, with 4.5-7.7x higher throughput and 6.9-13.4x lower MACs. Our code is open sourced at this https URL.

1602ソフトウェア工学
ad
ad
Follow
ad
タイトルとURLをコピーしました