DNA情報を98%圧縮し、作物の性質を高精度予測 ―AIにより計算時間を大幅短縮、品種改良を加速―

2026-04-20 東京大学

東京大学大学院農学生命科学研究科の研究チームは、作物のDNA情報を最大98%圧縮しつつ高精度に性質を予測できるAI手法「ConvCGP」を開発した。オートエンコーダでゲノムデータを圧縮し、畳み込みニューラルネットワークで収量や草丈などを予測する構造により、数百万〜1,000万規模の遺伝マーカー情報を効率的に処理可能となった。イネやトウモロコシの実証では、圧縮後も従来法以上の予測精度を維持し、計算時間も大幅に短縮(数分〜数日→数十秒〜数十分)された。これにより有望品種の迅速選抜が可能となり、品種改良の高速化とコスト削減、データ駆動型農業の推進に貢献すると期待される。

DNA情報を98%圧縮し、作物の性質を高精度予測 ―AIにより計算時間を大幅短縮、品種改良を加速―
本研究の概要
膨大なDNA情報をAIで圧縮し、その圧縮データから作物の性質(収量や草丈など)を予測することで、有望な品種の選抜を効率化する。

<関連情報>

ConvCGP:圧縮されたゲノムワイド多型から農業形質の遺伝的値を予測する畳み込みニューラルネットワーク ConvCGP: A convolutional neural network to predict genetic values of agronomic traits from compressed genome-wide polymorphisms

Tanzila Raihan, Chyon Hae Kim, Hiroyuki Shimono, Akio Kimura, Hiroyoshi Iwata
The Plant Genome  Published: 19 April 2026
DOI:https://doi.org/10.1002/tpg2.70223

Abstract

The growing size of genome-wide polymorphism data in animal and plant breeding has raised concerns regarding computational load and time, particularly when predicting genetic values for target traits using genomic prediction. Several deep learning and conventional methods, including dimensionality reduction techniques such as principal component analysis (PCA) and autoencoders, have been proposed to address these challenges by selecting subsets of polymorphisms or compressing high-dimensional data for predictive analysis. However, these methods are often computationally intensive and time-consuming. A major challenge in applying deep-learning models directly to high-dimensional genomic data is the substantial computational cost and time required for hyperparameter tuning and model training. To address these limitations, we propose a novel deep learning framework, Compression-based Genomic Prediction using Convolutional Neural Networks (ConvCGP), that integrates autoencoder-based nonlinear compression with convolutional neural network–based prediction in an end-to-end trainable pipeline. This method reduces data to a compact latent representation that retains meaningful information for prediction, thereby significantly reducing storage needs and computational load. We applied ConvCGP to high-dimensional rice datasets for agronomic trait prediction and further tested it on maize, which is large in scale. The results show that ConvCGP maintained prediction accuracy comparable to models trained on uncompressed data, even under extreme compression where only 2% of the original features were retained. This demonstrates that ConvCGP not only scales effectively to massive datasets but also preserves predictive information under drastic dimensionality reduction. Moreover, ConvCGP consistently outperformed PCA-based models, genomic best linear unbiased prediction, LASSO (least absolute shrinkage and selection operator), support vector machine, and other methods, establishing it as a powerful, efficient, and scalable solution for modern genomic prediction.

Plain Language Summary

Predicting plant traits using genomic data is increasingly important in agriculture. However, the size of genetic datasets has grown rapidly, making it hard to process the data efficiently. To tackle this, we developed a new deep learning method that combines convolutional neural networks (CNNs) with autoencoders. Autoencoders reduce the size of the genetic data, making it easier to process, while CNNs help in predicting plant traits. We tested this approach on large-scale rice and maize genetic data to predict important agronomic traits. Compared to other machine learning and data compression methods, our model made more accurate predictions while also reducing the time and computing power needed. This research shows how combining deep learning tools can help make better use of large genomic datasets in crop breeding.

1202農芸化学
ad
ad
Follow
ad
タイトルとURLをコピーしました