スマート農業向け害虫認識AI「PestVL-Net」を開発（PestVL-Net Enhances Pest Recognition for Smart Agriculture）

2026-05-27 合肥物質科学研究院（HFIPS）

中国科学院合肥物質科学研究院・智能機械研究所の張杰教授、謝成軍教授らの研究チームは、スマート農業向けの新しい害虫認識フレームワーク「PestVL-Net」を開発した。研究成果はCVPR 2026 Findingsに採択された。農業現場では害虫種の多様性や外観の類似性、高品質な画像データ収集の難しさが課題となっており、従来の画像認識技術では十分な精度が得られなかった。PestVL-Netは、画像情報とテキスト情報を統合するマルチモーダルAIを採用し、画像内の重要領域に注目して形状や質感の微細な差異を抽出する。また、農業知識や大規模言語モデルを活用した構造化害虫記述を組み合わせることで、視覚的に類似した害虫も高精度で識別可能にした。複数の公開データセットや新規害虫データセットで評価した結果、認識精度は約88～90％に達し、既存手法を上回った。研究チームは、本技術が精密農業や病害虫防除の高度化に役立つとしている。

Feature Map Visualization of Different Modules (Image by ZHANG Jie)

＜関連情報＞

PestVL-Net：きめ細かな視覚言語インタラクションによるマルチモーダル害虫学習の実現 PestVL-Net: Enabling Multimodal Pest Learning via Fine-grained Vision-Language Interaction

Xueheng Li, Tao Hu, Ke Cao, Runsheng Qi, Huixin Zhang, Rui Li, Jie Zhang, Chengjun Xie
arXiv Submitted on 19 Apr 2026
DOI:https://doi.org/10.48550/arXiv.2604.17278

Abstract

Effective pest recognition and management are crucial for sustainable agricultural development. However, collecting pest data in real scenarios is often challenging. Compared to other domains, pests exhibit a wide variety of species with complex and diverse morphological characteristics. Existing techniques struggle to effectively model the key visual and high-level semantic features of pests in a fine-grained manner. These limitations hinder the practical application of such methods in real agricultural scenarios. To address these critical challenges, we present a synergistic approach that integrates PestVL-Net, a novel vision-language framework, with two multi-species pest datasets to facilitate fine-grained pest learning. The visual pathway of PestVL-Net utilizes the Recurrent Weighted Key Value (RWKV) architecture, incorporating a saliency-guided adaptive window partitioning scheme to effectively model the fine-grained visual characteristics of pests. Concurrently, the linguistic component generates precise pest semantic descriptions by leveraging Multimodal Large Language Models (MLLMs) priors, critically informed by agricultural expert knowledge and structured via multimodal Chain-of-Thought (CoT) reasoning. The deep fusion of these complementary visual and textual representations enables fine-grained multimodal pest learning. Extensive experimental evaluations on multiple pest datasets validate the superior performance of PestVL-Net, highlighting its potential for effective real-world pest management.

月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31