測定された非線形性がAI性能を向上させることを示唆（AI benefits from measured non-linearity）

2026-02-10 マックス・プランク研究所

独マックス・プランク協会（MPG）の研究チームは、AIの学習性能向上に「適度な非線形性」が重要であることを実験的に示した。多くのAIモデルは非線形活性化関数や非線形層を持つが、非線形性の程度が学習効率や一般化能力に影響する具体的な仕組みは不明だった。研究では、ニューラルネットワークに制御可能な非線形性を導入し、非線形度が低すぎても高すぎても性能が落ちること、適度な非線形性が学習安定性・汎化性能・特徴分離能力を同時に高めることを示した。こうした「計測された非線形性」の最適化は、過学習抑制やトレーニング収束の加速につながる可能性があり、AIの構造設計・学習アルゴリズムの改善に寄与する。研究はAIモデルの理論理解と実用性能向上の両面で意義がある

＜関連情報＞

ほぼ線形 RNN を用いたシーケンスモデリングにおける非線形性の計算的役割の解明 Uncovering the Computational Roles of Nonlinearity in Sequence Modeling Using Almost-Linear RNNs

Manuel Brenner, Georgia Koppe
Open Review Published: 09 Jan 2026

Abstract

Sequence modeling tasks across domains such as natural language processing, time-series forecasting, speech recognition, and control require learning complex mappings from input to output sequences. In recurrent networks, nonlinear recurrence is theoretically required to universally approximate such sequence-to-sequence functions; yet in practice, linear recurrent models have often proven surprisingly effective. This raises the question of when nonlinearity is truly required. In this study, we present a framework to systematically dissect the functional role of nonlinearity in recurrent networks — allowing to identify both when it is computationally necessary, and what mechanisms it enables. We address the question using Almost Linear Recurrent Neural Networks (AL-RNNs), which allow the recurrence nonlinearity to be gradually attenuated and decompose network dynamics into analyzable linear regimes, making the underlying computational mechanisms explicit. We illustrate the framework across a diverse set of synthetic and real-world tasks, including classic sequence modeling benchmarks, an empirical neuroscientific stimulus-selection task, and a multi-task suite. We demonstrate how the AL-RNN’s piecewise linear structure enables direct identification of computational primitives such as gating, rule-based integration, and memory-dependent transients, revealing that these operations emerge within predominantly linear dynamical backbones. Across tasks, sparse nonlinearity plays several functional roles: it improves interpretability by reducing and localizing nonlinear computations, promotes shared (rather than highly distributed) representations in multi-task settings, and reduces computational cost by limiting nonlinear operations. Moreover, sparse nonlinearity acts as a useful inductive bias: in low-data regimes, or when tasks require discrete switching between linear regimes, sparsely nonlinear models often match or exceed the performance of fully nonlinear architectures. Our findings provide a principled approach for identifying where nonlinearity is functionally necessary in sequence models, guiding the design of recurrent architectures that balance performance, efficiency, and mechanistic interpretability.

月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28