知識誘導型機械学習を用いて農業用亜酸化窒素の予測精度を向上させる Researchers use knowledge-guided machine learning to boost accuracy of agricultural nitrous oxide predictions
2022-04-27 ミネソタ大学
この研究は、地球の数値モデルに特化した非営利の国際科学雑誌「Geoscientific Model Development」に最近掲載されました。参加した研究者は、ミネソタ大学、イリノイ大学アーバナ・シャンペーン校、ローレンス・バークレー国立研究所、ピッツバーグ大学の4校です。
<関連情報>
- https://cse.umn.edu/college/news/new-study-could-help-reduce-agricultural-greenhouse-gas-emissions
- https://gmd.copernicus.org/articles/15/2839/2022/
KGML-ag:農業生態系をシミュレーションするための知識誘導型機械学習のモデリングフレームワーク:メソコスム実験データを用いたN2O排出量の推定を事例として KGML-ag: a modeling framework of knowledge-guided machine learning to simulate agroecosystems: a case study of estimating N2O emission using data from mesocosm experiments
Licheng Liu, Shaoming Xu, Jinyun Tang, Kaiyu Guan, Timothy J. Griffis, Matthew D. Erickson, Alexander L. Frie, Xiaowei Jia, Taegon Kim, Lee T. Miller, Bin Peng, Shaowei Wu, Yufeng Yang, Wang Zhou, Vipin Kumar, and Zhenong Jin
Geoscientific Model Development Published: 07 Apr 2022
DOI:https://doi.org/10.5194/gmd-15-2839-2022
Abstract
Agricultural nitrous oxide (N2O) emission accounts for a non-trivial fraction of global greenhouse gas (GHG) budget. To date, estimating N2O fluxes from cropland remains a challenging task because the related microbial processes (e.g., nitrification and denitrification) are controlled by complex interactions among climate, soil, plant and human activities. Existing approaches such as process-based (PB) models have well-known limitations due to insufficient representations of the processes or uncertainties of model parameters, and due to leverage recent advances in machine learning (ML) a new method is needed to unlock the “black box” to overcome its limitations such as low interpretability, out-of-sample failure and massive data demand. In this study, we developed a first-of-its-kind knowledge-guided machine learning model for agroecosystems (KGML-ag) by incorporating biogeophysical and chemical domain knowledge from an advanced PB model, ecosys, and tested it by comparing simulating daily N2O fluxes with real observed data from mesocosm experiments. The gated recurrent unit (GRU) was used as the basis to build the model structure. To optimize the model performance, we have investigated a range of ideas, including (1) using initial values of intermediate variables (IMVs) instead of time series as model input to reduce data demand; (2) building hierarchical structures to explicitly estimate IMVs for further N2O prediction; (3) using multi-task learning to balance the simultaneous training on multiple variables; and (4) pre-training with millions of synthetic data generated from ecosys and fine-tuning with mesocosm observations. Six other pure ML models were developed using the same mesocosm data to serve as the benchmark for the KGML-ag model. Results show that KGML-ag did an excellent job in reproducing the mesocosm N2O fluxes (overall r2=0.81, and RMSE=3.6 from cross validation). Importantly, KGML-ag always outperforms the PB model and ML models in predicting N2O fluxes, especially for complex temporal dynamics and emission peaks. Besides, KGML-ag goes beyond the pure ML models by providing more interpretable predictions as well as pinpointing desired new knowledge and data to further empower the current KGML-ag. We believe the KGML-ag development in this study will stimulate a new body of research on interpretable ML for biogeochemistry and other related geoscience processes.