大規模グラフニューラルネットワーク推論性能の飛躍的向上～不規則なメモリーアクセスの解消により、計算速度と効率化を両立～

2025-06-20 東京科学大学

東京科学大学の研究チームは、大規模グラフニューラルネットワーク（GNN）推論を高速かつ効率的に実行できる新型AIアクセラレータ「BingoGCN」を開発。不規則なメモリアクセスを大幅に削減するCMQ（クロスパーティションメッセージ量子化）と、軽量な高性能モデルを可能にするSLT（Strong Lottery Ticket）理論を採用。従来比で最大65.7倍の高速化、107倍のエネルギー効率向上を実現。ソーシャル分析や自動運転、創薬など多様な分野での実装が期待される。ISCA 2025にて発表予定。

図1. 大規模グラフにおけるリアルタイムGNN処理の課題

＜関連情報＞

BingoGCN: 細粒度パーティショニングとSLTによるスケーラブルで効率的なGNNアクセラレーションに向けて BingoGCN: Towards Scalable and Efficient GNN Acceleration with Fine-Grained Partitioning and SLT

Jiale Yan, Hiroaki Ito, Yuta Nagahara, Kazushi Kawamura, Masato Motomura, Thiem Van Chu, Daichi Fujiki
ISCA ’25: Proceedings of the 52nd Annual International Symposium on Computer Architecture Published: 20 June 2025
DOI:https://doi.org/10.1145/3695053.3731115

Abstract

Graph Neural Networks (GNNs) are increasingly popular due to their wide applicability to tasks requiring the understanding of unstructured graph data, such as those in social network analysis and autonomous driving. However, real-time, large-scale GNN inference faces challenges due to the large size of node features and adjacency matrices, leading to memory communication and buffer size overheads caused by irregular memory access patterns. While graph partitioning can help with localized access patterns and reduction in on-chip buffer size, fine-grained partitioning results in increased inter-partition edges and off-chip memory accesses, negatively impacting overall performance.

To overcome these limitations, we propose BingoGCN, a scalable GNN acceleration framework that introduces multidimensional dynamic feature summarization called Cross-Partition Message Quantization (CMQ) for inter-partition message passing. This eliminates irregular off-chip memory access without additional training and accuracy loss, even with fine-grained partitioning. By shifting the bottleneck from memory to computation, BingoGCN allows for further performance optimization through the Strong Lottery Ticket (SLT) theory using randomly generated weights. BingoGCN addresses the challenge of SLT’s unstructured sparsity in hardware acceleration with a novel training algorithm and random weight generator designs, enabling fine-grained (FG) sparsity and improved load balancing. We integrated CMQ and FG-SLT into the message-passing of GNNs and designed an efficient hardware architecture to support this flow. Our FPGA-based implementation achieves a significant reduction in memory accesses while preserving accuracy comparable to the original models.

月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30