2026-06-12 スイス連邦工科大学ローザンヌ校(EPFL)
<関連情報>
- https://actu.epfl.ch/news/the-hidden-geometry-that-separates-complex-data/
- https://www.pnas.org/doi/10.1073/pnas.2522504123
カーネル埋め込みと測度分離現象 Kernel embeddings and the separation of measure phenomenon
Leonardo V. Santoro, Kartik G. Waghmare https://orcid.org/0000-0003-0912-685X, and Victor M. Panaretos
Proceedings of the National Academy of Sciences Published:June 5, 2026
DOI:https://doi.org/10.1073/pnas.2522504123

Significance
Two-sample testing examines whether two probability distributions on some feature space differ based on random samples. It is fundamental in statistics and machine learning, especially when feature spaces are complex. Such settings are challenging because the distributions cannot be modeled parsimoniously, making it difficult to identify plausible deviations and design effective test criteria. We prove that two continuous distributions on a general feature space differ if and only if two corresponding Gaussian measures perfectly separate. These Gaussians are defined via kernel embeddings. Gaussians either overlap or separate in a specific sense, measurable by precise criteria. Our theorem thus serves as a foundation for designing powerful inference tools in general settings and reveals a phenomenon underpinning the effectiveness of kernel methods.
Abstract
We prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. In statistical terms, we establish that testing for the equality of two nonatomic (Borel) probability measures on a locally compact uncountable Polish space is equivalent to testing for the singularity between two centered Gaussian measures on a reproducing kernel Hilbert space. The corresponding Gaussians are defined via the notion of kernel covariance embedding of a probability measure, and the Hilbert space is that generated by the embedding kernel. Distinguishing singular Gaussians is structurally simpler from an information-theoretic perspective than nonparametric two-sample testing, particularly in complex or high-dimensional domains. This is because singular Gaussians are supported on essentially separate and affine subspaces. Our proof leverages the classical Feldman–Hájek dichotomy, and shows that even a small perturbation of a continuous distribution will be maximally magnified through its Gaussian embedding. This “separation of measure phenomenon” appears to be a blessing of infinite dimensionality, by means of embedding, with the potential to inform the design of efficient inference tools in considerable generality. The elicitation of this phenomenon also appears to crystallize, in a precise and simple mathematical statement, a core mechanism underpinning the empirical effectiveness of kernel methods.


