AgriPestDatabase-v1.0 Dataset Supports Training Lightweight Agricultural LLMs for Pest Management

Researchers have released AgriPestDatabase-v1.0, a structured insect dataset designed to train lightweight large language models for agricultural pest management, according to a paper accepted at the Artificial Super Intelligence Conference 2026.

According to arxiv.org, the dataset addresses the challenge of providing expert knowledge to farmers in rural regions with limited or no internet connectivity. The researchers collected textual data by “reviewing and collecting information from available pest databases and published manuscripts on nine selected pest species,” with validation by domain experts. From these structured reports, they constructed question-and-answer pairs for model training and evaluation.

The team applied LoRA-based fine-tuning to multiple lightweight LLMs (≤7B parameters) for edge device deployment. According to the paper, Mistral 7B achieved an 88.9% pass rate on domain-specific Q/A tasks, substantially outperforming Qwen 2.5 7B (63.9%) and LLaMA 3.1 8B (58.7%). The researchers noted that “Mistral demonstrates higher semantic alignment (embedding similarity: 0.865) despite lower lexical overlap (BLEU: 0.097), indicating that semantic understanding and robust reasoning are more predictive of task success than surface-level conformity in specialized domains.”

The work “demonstrates the feasibility of deploying compact, high-performing language models for practical field-level pest management guidance,” according to the paper.