2026-04-20 東京大学

本研究の概要
膨大なDNA情報をAIで圧縮し、その圧縮データから作物の性質(収量や草丈など)を予測することで、有望な品種の選抜を効率化する。
<関連情報>
- https://www.a.u-tokyo.ac.jp/topics/topics_20260420-1.html
- https://acsess.onlinelibrary.wiley.com/doi/10.1002/tpg2.70223
ConvCGP:圧縮されたゲノムワイド多型から農業形質の遺伝的値を予測する畳み込みニューラルネットワーク ConvCGP: A convolutional neural network to predict genetic values of agronomic traits from compressed genome-wide polymorphisms
Tanzila Raihan, Chyon Hae Kim, Hiroyuki Shimono, Akio Kimura, Hiroyoshi Iwata
The Plant Genome Published: 19 April 2026
DOI:https://doi.org/10.1002/tpg2.70223
Abstract
The growing size of genome-wide polymorphism data in animal and plant breeding has raised concerns regarding computational load and time, particularly when predicting genetic values for target traits using genomic prediction. Several deep learning and conventional methods, including dimensionality reduction techniques such as principal component analysis (PCA) and autoencoders, have been proposed to address these challenges by selecting subsets of polymorphisms or compressing high-dimensional data for predictive analysis. However, these methods are often computationally intensive and time-consuming. A major challenge in applying deep-learning models directly to high-dimensional genomic data is the substantial computational cost and time required for hyperparameter tuning and model training. To address these limitations, we propose a novel deep learning framework, Compression-based Genomic Prediction using Convolutional Neural Networks (ConvCGP), that integrates autoencoder-based nonlinear compression with convolutional neural network–based prediction in an end-to-end trainable pipeline. This method reduces data to a compact latent representation that retains meaningful information for prediction, thereby significantly reducing storage needs and computational load. We applied ConvCGP to high-dimensional rice datasets for agronomic trait prediction and further tested it on maize, which is large in scale. The results show that ConvCGP maintained prediction accuracy comparable to models trained on uncompressed data, even under extreme compression where only 2% of the original features were retained. This demonstrates that ConvCGP not only scales effectively to massive datasets but also preserves predictive information under drastic dimensionality reduction. Moreover, ConvCGP consistently outperformed PCA-based models, genomic best linear unbiased prediction, LASSO (least absolute shrinkage and selection operator), support vector machine, and other methods, establishing it as a powerful, efficient, and scalable solution for modern genomic prediction.
Plain Language Summary
Predicting plant traits using genomic data is increasingly important in agriculture. However, the size of genetic datasets has grown rapidly, making it hard to process the data efficiently. To tackle this, we developed a new deep learning method that combines convolutional neural networks (CNNs) with autoencoders. Autoencoders reduce the size of the genetic data, making it easier to process, while CNNs help in predicting plant traits. We tested this approach on large-scale rice and maize genetic data to predict important agronomic traits. Compared to other machine learning and data compression methods, our model made more accurate predictions while also reducing the time and computing power needed. This research shows how combining deep learning tools can help make better use of large genomic datasets in crop breeding.

