Comparing Machine Learning Methods in Classifying Red Blotch and Leafroll Viruses in VIS Hyperspectral Images of Leaves
Erica Sawyer, Eve Laroche-Pinel, Madison Flasco,
Monica Cooper, Benjamin Corrales, Marc Fuchs, and Luca
Brillante*
*Department of Viticulture and Enology, California State
University Fresno, 2360 E Barstow Ave, Fresno, CA, 93740
(lucabrillante@csufresno.edu)
Hyperspectral imaging spectrometry offers new opportunities for viral disease scouting. We compared two machine-learning methods, Random Forest (RF) and 3D-Convolutional Neural Network (CNN), to identify and distinguish leaves from red blotch-infected vines, leafroll-infected vines, and vines co-infected with both viruses using spatiospectral information in the visible domain (510 to 710 nm). As assessed with a five-fold cross-validation scheme, when binarily classifying infected versus non-infected leaves, the CNN model outperformed the RF model with 85.5% accuracy versus 80% for the RF model at mid-ripening. The accuracy of both models decreased when leaf samples collected at veraison were analyzed. Based on a multiclass categorization of leaves, the CNN and RF models had an accuracy of 70% and 68% (averaged across both healthy and infected leaf categories) at mid-ripening, and 60% and 63% at veraison, respectively. A comparative analysis of PCR-based virus identification and machine learning outcomes revealed that the leafroll-infected category was better solved, followed by the non-infected, red blotch-infected, and double infection categories. When two leaves were imaged per plant and predictions were obtained independently for each leaf, variability in symptom expression between leaves affected the RF model more than the CNN model. The CNN model equally classified two leaves from the same plant with ~75% frequency while the RF model achieved the same results with only a 55% frequency. When using two leaves to predict the infection status, the prediction accuracy increased in both models, especially in harder-to-predict categories. Interpretation of the RF data showed that the most important wavelengths were in the green, orange, and red subregions, and associated with pigment concentration change, chlorophyll, and carotenoid absorption. While differentiation between plants co-infected with GLRaVs and GRBV proved challenging, both models showed promising accuracies across categories.
Funding Support: CDFA-SCBGP, CSU-ARI System Grants