2025-06-20 東京科学大学
図1. 大規模グラフにおけるリアルタイムGNN処理の課題
<関連情報>
- https://www.isct.ac.jp/ja/news/vun1gptyhhql
- https://www.isct.ac.jp/plugins/cms/component_download_file.php?type=2&pageId=&contentsId=1&contentsDataId=1767&prevId=&key=cd07b4a729fcce478b446f7b02fe524c.pdf
- https://dl.acm.org/doi/10.1145/3695053.3731115
BingoGCN: 細粒度パーティショニングとSLTによるスケーラブルで効率的なGNNアクセラレーションに向けて BingoGCN: Towards Scalable and Efficient GNN Acceleration with Fine-Grained Partitioning and SLT
Jiale Yan, Hiroaki Ito, Yuta Nagahara, Kazushi Kawamura, Masato Motomura, Thiem Van Chu, Daichi Fujiki
ISCA ’25: Proceedings of the 52nd Annual International Symposium on Computer Architecture Published: 20 June 2025
DOI:https://doi.org/10.1145/3695053.3731115
Abstract
Graph Neural Networks (GNNs) are increasingly popular due to their wide applicability to tasks requiring the understanding of unstructured graph data, such as those in social network analysis and autonomous driving. However, real-time, large-scale GNN inference faces challenges due to the large size of node features and adjacency matrices, leading to memory communication and buffer size overheads caused by irregular memory access patterns. While graph partitioning can help with localized access patterns and reduction in on-chip buffer size, fine-grained partitioning results in increased inter-partition edges and off-chip memory accesses, negatively impacting overall performance.
To overcome these limitations, we propose BingoGCN, a scalable GNN acceleration framework that introduces multidimensional dynamic feature summarization called Cross-Partition Message Quantization (CMQ) for inter-partition message passing. This eliminates irregular off-chip memory access without additional training and accuracy loss, even with fine-grained partitioning. By shifting the bottleneck from memory to computation, BingoGCN allows for further performance optimization through the Strong Lottery Ticket (SLT) theory using randomly generated weights. BingoGCN addresses the challenge of SLT’s unstructured sparsity in hardware acceleration with a novel training algorithm and random weight generator designs, enabling fine-grained (FG) sparsity and improved load balancing. We integrated CMQ and FG-SLT into the message-passing of GNNs and designed an efficient hardware architecture to support this flow. Our FPGA-based implementation achieves a significant reduction in memory accesses while preserving accuracy comparable to the original models.