2026-03-05 国立情報学研究所

<関連情報>
590ナノ秒、757GbpsのFPGAロス圧縮ネットワーク A 590-Nanosecond 757-Gbps FPGA Lossy Compressed Network
Michihiro Koibuchi; Takumi Honda; Naoto Fukumoto; Shoichi Hirasawa; Koji Nakano
IEEE Transactions on Parallel and Distributed Systems Published:02 February 2026
DOI:https://doi.org/10.1109/TPDS.2026.3659817
Abstract
Inter-FPGA communication bandwidth has become a limiting factor in scaling memory-intensive workloads on FPGA-based systems. While modern FPGAs integrate high-bandwidth memory (HBM) to increase local memory throughput, network interfaces often lag behind, creating an imbalance between computation and communication resources. Data compression is a technique to increase effective communication bandwidth by reducing the amount of data transferred, but existing solutions struggle to meet the performance and operation latency requirements of FPGA-based platforms. This paper presents a high-throughput lossy compression framework that enables sub-microsecond latency communication in FPGA clusters. The proposed design addresses the challenge of aligning variable-length compressed data with fixed-width network channels by using transpose circuits, memory-bank reordering, and word-wise operations. A run-length encoding scheme with bounded error is employed to compress floating-point and fixed-point data without relying on complex fine-grained bit-level manipulations, enabling low-latency and scalable implementation. The proposed architecture is implemented on a custom Stratix 10 MX2100 FPGA card equipped with eight 50 Gbps network ports and silicon photonics transceivers. The system achieves up to 757 Gbps of aggregate bandwidth per FPGA in collective communication operations. Compression and decompression are performed within 590 ns total latency, while maintaining the quality of results in a GradAllReduce workload for deep learning.


