2026-05-26 アルゴンヌ国立研究所(ANL)

Argonne’s inference service is powered by ALCF systems including Sophia (left) and Metis (right). (Image by Argonne National Laboratory.)
<関連情報>
- https://www.anl.gov/article/argonne-launches-first-largescale-ai-inference-service-for-open-science
- https://dl.acm.org/doi/10.1145/3731599.3767346
FIRST: 科学AIモデルアクセス向け連合型推論リソーススケジューリングツールキット FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model Access
Aditya Tanikanti, Benoit Côté, Yanfei Guo, Le Chen, Nickolaus Saint, Ryan Chard, + 6
SC Workshops ’25: Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing Published: 15 November 2025
DOI:https://doi.org/10.1145/3731599.3767346
Abstract
We present the Federated Inference Resource Scheduling Toolkit (FIRST), a framework enabling Inference-as-a-Service across distributed High-Performance Computing (HPC) clusters. FIRST provides cloud-like access to diverse AI models, like Large Language Models (LLMs), on existing HPC infrastructure. Leveraging Globus Auth and Globus Compute, the system allows researchers to run parallel inference workloads via an OpenAI-compliant API on private, secure environments. This cluster-agnostic API allows requests to be distributed across federated clusters, targeting numerous hosted models. FIRST supports multiple inference backends (e.g., vLLM), auto-scales resources, maintains “hot” nodes for low-latency execution, and offers both high-throughput batch and interactive modes. The framework addresses the growing demand for private, secure, and scalable AI inference in scientific workflows, allowing researchers to generate billions of tokens daily on-premises without relying on commercial cloud infrastructure.

