Thompson Rivers University “Multi-scale spatial image analysis for mapping spotted knapweed in grassland ecosystems” by Shohreh Sahebi A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in Environmental Science KAMLOOPS, BRITISH COLUMBIA March 2024 Dr. David Hill Dr. Wendy Gardner Dr. Musfiq Rahman ©Shohreh Sahebi, 2024 This study investigates the value of scale-space representations of spatial features for the identification and mapping of invasive plant species in high-resolution multispectral imagery. Scale-space representations combine the spatial domain with a scale dimension such that spatial features can be represented at multiple-spatial scales. In this work, Gaussian pyramids (GPs) are constructed to create discrete representations of the spatial features across the scale-space. A case study is employed to evaluate the performance of classifiers constructed using features at various levels within the GP scalespace to a classifier constructed using all features across the scale space. This case study explores the identification and mapping of spotted knapweed in a grassland ecosystem using multispectral imagery acquired using a remotely piloted aircraft system (RPAS). Given the large number of features, feature optimization was critical for developing highperforming classifiers. Classification was performed using two machine learning classifiers, random forest and support vector machine (SVM). The results of this case study show that very high-spatial resolution features do not produce the best image classifications, but rather that there is an optimal scale, lower than that of the raw imagery, of the image features that produces the best classification accuracy. The results also show that classification is not improved by the inclusion of features at multiple spatial scales. These findings suggest not only that feature spatial scale optimization can improve image analysis, but also that this optimization can inform RPAS flight planning to improve mission efficiency. Keywords: Remote sensing, Gaussian pyramid, Scale-Space Random forest, Support vector machine, Invasive plants, Centaurea stoebe, Remotely piloted aircraft systems (RPAS) 2 Table of Contents Chapter 1 Introduction ..................................................................................... 8 Introduction...........................................................................................................9 Preserving Grassland Ecosystems............................................................................9 Remote Sensing Techniques for Mapping Invasive Plant Species ............................ 10 Enhancing Plant Species Invasion Monitoring with Remotely Piloted Aircraft Systems ........................................................................................................................................ 11 Machine Learning Algorithms for Mapping Invasive Plants ....................................12 Objectives ............................................................................................................ 13 Study Site ............................................................................................................ 14 Data Acquisition and Image Processing .................................................................15 Roadmap of the Thesis ......................................................................................... 19 Chapter 2 Method ......................................................................................... 21 Introduction......................................................................................................... 22 Machine Learning: an Overview ........................................................................... 23 Data Partitioning: Training and Validation Data..................................................................... 26 Cross Validation...................................................................................................................... 27 Recursive Feature Elimination (RFE) ...................................................................................... 29 Random forest .....................................................................................................30 Optimizing RF performance: Hyperparameters Tuning ......................................................... 32 Support Vector Machine (SVM) ............................................................................ 34 Optimizing SVM Performance: Hyperparameters Tuning...................................................... 37 Optimizing Machine Learning Performance: Grid Search ....................................... 38 Gray Level Co-occurrence Matrix (GLCM) Based Texture Analysis ........................... 38 Mathematical Representation of GLCM-based Features....................................................... 41 Introduction of Scale-Space Theory and the Gaussian Pyramid .............................. 43 Importance of Multi-scale Analysis ....................................................................................... 44 3 Axiomatic Derivations and the Gaussian Approach............................................................... 44 Gaussian Convolution and Scale-Space ................................................................................. 45 Gaussian Pyramids ................................................................................................................. 47 Conclusion ........................................................................................................... 50 Chapter 3 Invasive Species Mapping: A Case Study on Spotted Knapweed Detection in Grassland Ecosystems ............................................................................ 52 Introduction......................................................................................................... 53 Methods .............................................................................................................. 53 Feature Extraction .................................................................................................................. 54 Feature Compilation .............................................................................................................. 57 Model Creation ...................................................................................................................... 58 Results................................................................................................................. 60 Feature Selection ................................................................................................................... 60 Hyperparameter Optimization ............................................................................................... 68 Classifier Performance ........................................................................................................... 70 Discussion............................................................................................................ 87 Mapping Spotted Knapweed at Site 3 ................................................................................... 91 Conclusion ........................................................................................................... 97 Chapter 4 Conclusion & Future works ............................................................. 99 Conclusion ......................................................................................................... 100 Future Work - Expanding Methodologies ............................................................ 102 Generative Adversarial Networks for Training Data Augmentation .................................... 102 Feature Optimization through Data Compression ............................................................... 103 Other suggestions ................................................................................................................ 104 REFERENCES ................................................................................................ 106 4 List of Figures: Figure 1: Locations of the 3 field sites in Laurie Guichon Memorial Grasslands Interpretive Site (LGMGIS) explored in this study. This map uses the WGS 84 UTM Zone 10N coordinate system. 14 Figure 2: A diagram demonstrating threefold cross-validation. Symbols represent training set samples, divided into three groups. Sequentially, each group is excluded during model training. Performance estimates, such as error rate, are derived for each withheld sample set. Averaging these performance metrics gives the cross-validation estimate of model performance. ..............28 Figure 3: Decision tree structure showcasing root nodes, decision nodes, leaf nodes, and branches..........................................................................................................................................31 Figure 4: Illustration of an SVM classifier with a linear kernel applied to linearly separable data, highlighting the optimal separating hyperplane (solid line) and the margins (dashed lines) defined by the support vectors (circled points). ..............................................................................35 Figure 5: Visualization of a non-linear SVM classification, showing how a non-linear boundary effectively separates two distinct classes in the feature space. .....................................................36 Figure 6:Illustration of the kernel trick applied to data in a 3D space, demonstrating the transformation that facilitates the separation of classes which are not linearly separable in the original dimensions. ........................................................................................................................36 Figure 7: Illustration of the four primary directions (0°, 45°, 90°, and 135°) used in the computation of Haralick texture features for GLCM analysis, depicting the relative positioning of pixel pairs. .......................................................................................................................................39 Figure 8: 3D representation of a 2D uniform Gaussian kernel, visualizing the symmetrical distribution and peak concentration at the origin. .........................................................................46 Figure 9: Gaussian Pyramid. The image illustrates five levels of the GP, spanning from the original image at level l0 to the fifth level, l4...............................................................................................48 Figure 10: Gaussian Pyramid. The image illustrates five levels of the GP, spanning from the original image at level l0 to the fifth level, l4. Note the extent of the physical domain represented by each image stays the same despite the reduction in the number of number of pixels constituting the image. ...................................................................................................................55 Figure 11: A procedure of classifying Spotted Knapweed using RF and SVM classifiers. ...............60 Figure 12: RFECV Results for GP0, showing the optimal selection of 21 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. ..............................................................61 5 Figure 13: RFECV Results for GP1, showing the optimal selection of 16 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. ..............................................................62 Figure 14: RFECV Results for GP2, showing the optimal selection of 4 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. ..............................................................63 Figure 15: RFECV Results for GP3, showing the optimal selection of 16 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. ..............................................................64 Figure 16: RFECV Results for GP4, showing the optimal selection of 8 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. ..............................................................65 Figure 17: RFECV Results for GPs, showing the optimal selection of 5 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. ..............................................................66 Figure 18: RF confusion matrix for the GP0 validation set. ............................................................71 Figure 19: SVM confusion matrix for GP0 validation set. ...............................................................71 Figure 20: RF confusion matrix for the GP1 validation set. ............................................................74 Figure 21: SVM confusion matrix for the GP1 validation set. .........................................................74 Figure 22: RF confusion matrix for the GP2 validation set. ............................................................76 Figure 23: SVM confusion matrix for the GP2 validation set. .........................................................76 Figure 24: RF confusion matrix for the GP3 validation set. ............................................................79 Figure 25: SVM confusion matrix for the GP3 validation set. .........................................................79 Figure 26: RF confusion matrix for the GP4 validation set. ............................................................82 Figure 27: SVM confusion matrix for the GP4 validation set. .........................................................82 Figure 28: RF confusion matrix for the concatenated GPs validation set. ......................................85 Figure 29: SVM confusion matrix for the concatenated GPs validation set. ..................................85 Figure 30: VNIR generated image from flight data collected at field site 3 on July 4, 2018. .........94 Figure 31: RF Classification map generated using GLCM-GP0 meta pixel-based image analysis, illustrating the relative abundance of spotted knapweed. .............................................................95 6 Figure 32: RF Classification map generated using GLCM-GP2 meta pixel-based image analysis, illustrating the relative abundance of spotted knapweed. This level demonstrates the highest accuracy. .........................................................................................................................................96 List of Tables: Table 1: Band number, band names, central wavelength, and full width at half maximum (FWHM) of Parrot Sequoia sensor. .................................................................................................15 Table 2: Parameters adopted during flight data collection. ...........................................................16 Table 3: Relation between each level of GP and GSD. ....................................................................55 Table 4: GLCM Extracted Features. .................................................................................................56 Table 5: Classification of Spotted Knapweed Abundance in Surveyed Sites. ..................................57 Table 6: Classifier performance metrics used to evaluate classifiers performance. .......................59 Table 7: Optimized Features for GP0, GP1, GP2, GP3, GP4 and concatenated GPs. ......................67 Table 8: Range of hyperparameter values considered for tuning the Random Forest Classifier using Grid Search. ...........................................................................................................................68 Table 9: Range of hyperparameter values considered for tuning SVM using Grid Search. ............68 Table 10: Result of RF hyperparameters tuning for GP0 to GP4 and GPs. ......................................69 Table 11: Result of SVM hyperparameters tuning for GP0 to GP4 and GPs. ..................................70 Table 12: RF and SVM classification results for the GP0 feature set. .............................................72 Table 13: RF and SVM classification results for the GP1 feature set. .............................................75 Table 14: RF and SVM classification results for the GP2 feature set. .............................................77 Table 15: RF and SVM classification results for the GP3 feature set. .............................................80 Table 16: RF and SVM classification results for the GP4 feature set. .............................................83 Table 17: RF and SVM classification result for the concatenated GPs feature set..........................86 7 Chapter 1 Introduction 8 Introduction Invasive plants are non-native species that have become established in new environments, impacting the structure and function of the existing ecosystems and outcompeting native biotic communities (Pyšek and Richardson, 2010; Qian and Ricklefs, 2006). The estimated economic loss caused by all types of invasive species has been at least $1.288 trillion (U.S. dollars) worldwide since 1970 (Diagne et al., 2021). One notable example in the grasslands and woodlands of western North America is Centaurea stoebe, or spotted knapweed. This species has become a significant management challenge as it has spread across millions of hectares, resulting in significant financial repercussions both from control measures and decreased forage yield (Singh et al., 2022). Land managers recognize the importance of intensive monitoring and early detection in effectively managing invasive species (Hobbs and Humphries, 1995). However, controlling invasions can be challenging due to the large size and complexity of invaded ecosystems (Holden, Nyrop, and Ellner, 2016). Early detection of invasive species has been shown to enhance the cost-effectiveness of treatment strategies (Malanson and Walsh, 2013; Holden et al., 2016). For this reason, accurate and reliable methods for early detection of invasive species are vital. These methods often involve a combination of surveillance, monitoring, and rapid response systems. Advanced technologies, such as remote sensing (RS), DNA analysis, and predictive modeling, are increasingly being employed to improve early detection capabilities (Cassidy, 2020; Martinez et al., 2020). Preserving Grassland Ecosystems Detecting and mapping species invasions in grasslands is essential for managing these sensitive ecosystems. Grasslands, including both sown pasture and rangeland, comprise some of the largest ecosystems worldwide, representing approximately 20 to 40 percent of the Earth's land area (Suttie et al., 2005). Natural grasslands are one of the most endangered ecosystems in North America (Samson and Knopf, 1994). Grasslands provide irreplaceable ecosystem services to people and the environment (O’Mara, 2012). However, human utilization of these grasslands can inadvertently promote the spread of 9 invasive plants, leading to the displacement of native species and a decline in land values, as well as ecological goods and services (Foster et al., 2020; Gaskin et al., 2021). Remote Sensing Techniques for Mapping Invasive Plant Species Field-based visual inspections and plant species inventories are often used for mapping plant species invasions; however, these methods are time-consuming and impractical for large areas (Bradley, 2014). To address these limitations, the potential of RS technology, known for its ability to collect data over vast spatial extents, has been extensively explored for mapping species invasions (Bradley, 2014; Huang and Asner, 2009; Joshi et al., 2004). Remote sensing techniques have shown promise in mapping invasive plants based on various plant characteristics such as seasonal phenology, biochemical, physiological, and structural characteristics, as well as the prevalence of invasive species in the study area (Gholizadeh et al., 2022). Many studies have investigated multispectral and hyperspectral imagery to identify invasive plants during flowering and fruiting when these plants exhibit a distinct spectral response from surrounding green plants (Andrew and Ustin, 2009; Huang and Geiger, 2008; Ishii and Washitani, 2013). However, the success of these approaches depends not only on the distinct phenology of target invasive plants but also on access to RS data collected when the target plants are in a specific phenological stage, requiring either timetargeted image acquisition or high-temporal resolution imagery throughout the growing season. High spatial resolution hyperspectral imagery has been used to detect invasive plants based on how their biochemical, physiological, and/or structural traits affect their spectral response (Glenn et al., 2005; Mitchell and Glenn, 2009; Yang and Everitt, 2010), due to the costs of acquiring high-resolution hyperspectral imagery, however, this approach is less common. The primary challenge in vegetation mapping using RS is precisely differentiating target plants from background vegetation. This challenge is even more pronounced in grasslands due to the small size and sparse canopy of vegetation (Malanson and Walsh, 2013). Effective detection often necessitates that the invasive plants form patches reasonably uniform in nature and larger than the spatial resolution of the RS imagery (He 10 et al., 2015). Such observations underscore the significance of image spatial resolution in the mapping process (Underwood et al., 2007). Satellite-acquired multispectral imagery offers spatial resolutions that range from tens of meters to meters. Several studies have explored the use of imagery acquired by the NASA/USGS Landsat-8 program (Khare et al., 2018; Matongera et al., 2017; Royimani et al., 2019) and the ESA Sentinel-2 constellation (Duncan et al., 2023; Gholizadeh et al., 2022; Hawryło et al., 2018; Rupasinghe and Chow-Fraser, 2021) for mapping invasive species. However, with resolutions of 30 m and 10-20 m, respectively, imagery from these satellite programs are ill-suited for mapping invasive plants with small or sparse canopies, particularly in the early stages of invasion (Malanson and Walsh, 2013). For detecting small or fragmented patches of invasive species, researchers have explored the privately operated World View Satellite program and the ESA Planet Scope program acquire imagery. Imagery from these systems have spatial resolutions of 1.84m and 3m, respectively (Lake et al., 2022; Shiferaw et al., 2019). However, even meter-scale imagery is too coarse to support the mapping of map invasives species with very small or very sparse canopies (Malanson and Walsh, 2013). Enhancing Plant Species Invasion Monitoring with Remotely Piloted Aircraft Systems Recently, there has been increasing interest in using remotely piloted aircraft systems (RPASs) as an RS platform for monitoring and mapping plant species invasions (Dvořák et al., 2015; Hill et al., 2020; Lehmann et al., 2017; Mallmann et al., 2020). One of the key advantages of using RPAS is their ability to capture spectral images with unprecedented levels of spatial and spectral resolutions (Hill et al., 2020); this means that the acquired data can provide highly detailed and accurate measurements of the target vegetation. Another significant benefit of RPASs is the ease and flexibility in deploying these systems for imaging missions. Unlike traditional aerial or satellite platforms, RPASs can be quickly launched and maneuvered over specific areas of interest; this enables researchers and land managers to conduct imaging missions with high frequency, 11 increasing the temporal resolution of the collected data (Hill et al., 2020; Klosterman et al., 2018; Klosterman & Richardson, 2017). Access to high-resolution imagery provided by RPASs has revolutionized the field of invasive species monitoring and management. High resolution RPAS-acquired imagery allows for detecting and measuring target species within treatment areas with remarkable precision (Martin et al., 2018; Hill et al., 2017; Tamouridou et al., 2017). Researchers can identify and quantify the extent of invasive species presence, monitor their spread, and assess the effectiveness of control measures. This level of detail and accuracy is particularly valuable when dealing with very small or very sparse canopies, where satellite-based imagery may not be as effective (Gholizadeh et al., 2022; Malanson & Walsh, 2013). Machine Learning Algorithms for Mapping Invasive Plants Machine learning algorithms (MLAs) have become a powerful tool in RS image analysis, due to their ability to model highly dimensional and non-linear data with complex interactions and overcome challenges associated with data gaps (Thessen, 2016). While many machine learning-based classification techniques exist, both parametric and nonparametric, their applications extend beyond just RS classifications. For instance, these techniques have found utility in areas such as medical imaging (Erickson et al., 2017), financial forecasting (Kamalov et al., 2021), and text classification (Ikonomakis et al., 2005). Within the realm of RS, they play a pivotal role in tasks like land cover mapping (Petropoulos et al., 2012), vegetation health monitoring (Hawryło et al., 2018; Selvaraj et al., 2020), and urban development tracking (Shafizadeh-Moghadam et al., 2017). Machine learning algorithms can effectively learn and model these complex patterns, enabling the identification and classification of invasive plants with improved accuracy and precision compared to traditional manual methods or basic automated processes (Mountrakis et al., 2011). Furthermore, the integration of ancillary data, such as environmental variables or topographic features, can further enhance the classification performance (Ng et al., 2016; Nininahazwe et al., 2023). 12 Objectives In the southern interior of British Columbia, the invasive spotted knapweed (Centaurea stoebe) is not only a growing concern but has also been categorized under the Regional Containment/Control priority categories established by Provincial Priority Invasive Species BC (Inter-Ministry Invasive Species Working Group March 2021). Recently Baron and Hill (2020) developed a method, called metapixel-based image classification, for mapping spotted knapweed in a grassland ecosystem using RPAS-acquired multispectral imagery. The metapixel-based classification segments the study area into non-overlapping squares larger than the image resolution, termed metapixels, to derive spectral features from the image pixels within each metapixel. Baron and Hill (2020) used a metapixel size of 1m2, corresponding to the size of quadrats commonly used by range managers, to determine the relative abundance of target species. Their study showed that by applying this method and using second-order spatial statistics derived from the grey level co-occurrence matrix (GLCM) of the metapixels (Haralick et al., 1973), the relative abundance of spotted knapweed in each metapixel could be determined with an overall accuracy of 66.0% when validated with an independent dataset (Baron and Hill, 2020). This study aims to expand on the previous work by exploring the impact of image spatial resolution on mapping spotted knapweed in a grassland ecosystem. Due to weight limitations, RPAS-based RS often employs less accurate sensors than satellite or conventional aircraft-based RS (Hill et al., 2019). Increasing the area corresponding to a pixel, called the ground-resolved distance, in an image increases the amount of spatial averaging in determining the measurement associated with that pixel. While this averaging can enhance the image smoothness by reducing high-frequency details, it also amplifies the within-pixel variability of the captured spectral data, leading to image blurring. Building on these insights, I hypothesize that there exists an optimal spatial resolution for spectral features used in classifying spotted knapweed within multispectral imagery. It's crucial to note that when referring to "features," the emphasis is on the descriptive attributes of the metapixel, rather than physical elements present within the image scene, such as trees or grass. 13 To investigate these hypotheses, this study is driven by two primary objectives: 1) Determine if there is a relationship between spatial resolution and image classification accuracy that can help identify an optimal spatial resolution for mapping spotted knapweed using multispectral imagery. 2) Determine if the image features at multiple spatial resolutions improve or hinder the identification of spotted knapweed using multispectral imagery. Study Site The data that will be used in this work was collected within the Laurie Guichon Memorial Grasslands Interpretive Site (LGMGIS), which is located south of Merritt, British Columbia (BC), Canada. This 100-hectare site is situated in Canada's Western Cordillera physiographic region and is classified as representing BC’s Interior Douglas Fir dry hot ecosystem subzone. In a previous study conducted by Baron and Hill (2020), three field sites were selected within the LGMGIS. Each field site covered an approximate area of 1 hectare and exhibited a gentle slope with a southern aspect. Figure 1 shows the locations of these three field sites within the LGMGIS. Figure 1: Locations of the 3 field sites in Laurie Guichon Memorial Grasslands Interpretive Site (LGMGIS) explored in this study. This map uses the WGS 84 UTM Zone 10N coordinate system. 14 Data Acquisition and Image Processing The data used in this research was collected by Jackson Baron as part of his thesis work (Baron, 2019). These data were collected using a Parrot Sequoia multispectral sensor, featuring a 16-megapixel digital camera and four 1.2-megapixel global shutter single-band imagers, accompanied by an incident light sensor and GPS. The sensor was securely mounted on a DJI Phantom 4 RPAS for aerial data acquisition. Imaging flights were carried out between 11:00 a.m. and 1:00 p.m. July 04, July 12, and July 19, 2018. On July 04, imagery was acquired at all 3 field sites. However, due to technical difficulties with the Phantom 4 RPAS, data was not acquired from Site 1 on July 12 or from Site 3 on July 19. The Parrot Sequoia sensor, which features a 16-megapixel camera and an in-built GPS, was the primary tool. The sensitivities of this sensor's single-band imagers are detailed in Table 1. For optimal results considering both safety and resolution, a flight altitude of 30m was maintained. All conducted flights were in compliance with the 2018 Canadian Aviation Regulations. More specifics on the flight parameters can be found in Table 2. Table 1: Band number, band names, central wavelength, and full width at half maximum (FWHM) of Parrot Sequoia sensor. Band Number Nominal Reflectance Centered Wavelength FWHM (nm) 1 Green 550 40 2 Red 660 40 3 Red Edge 735 10 4 NIR 790 40 15 Table 2: Parameters adopted during flight data collection. Height Above Forward Overlap Side Overlap Ground Level(m) (%) (%) 30 80 80 Time of Day Max Wind Speed (km/h) 11:00a.m. 1:00p.m. 40 Every scene captured by the Sequoia's single-band imagers assigns a digital number (DN) to each pixel. These DNs are linked to the radiance (measured in Wm-2 sr-1) reflected from the land surface over the pixel area. This relationship can be expressed through an equation provided by Parrot (2017). 𝐷𝑁−𝐵 𝐿 = 𝑓 2 𝐴𝜀𝛾+𝐶 (1) Where 𝑫𝑵 represents the digital value assigned to each pixel. The exposure time of the image, given in seconds, is denoted by 𝜺. The ISO is represented by 𝜸. The f-number of the imager, symbolized by 𝒇, is set at 2.2, and it provides the relationship between the focal length and the aperture diameter of the lens. Calibration coefficients are denoted by 𝑨, 𝑩, and 𝑪. These specific values are stored within the exchangeable image file (EXIF) metadata during the image capture process. Additionally, during image capture by the single-band imagers, the Sequoia sensor’s incident light sensor records a radiance level, denoted as 𝚿. This radiance is associated with the irradiance, 𝑬 (measured in Wm-2 sr-1) that the land surface receives over the area represented by a pixel. This relationship is defined according to a specific equation, as cited from Tu et al. (2018). 𝐸=𝑎 Ψ 𝐺Γ (2) Where 𝑬 is the irradiance on the land surface, 𝑎 represents a constant, 𝚿 denotes the radiance detected by the sensor. Additionally, 𝐆 signifies the sensor's gain, and 𝚪 stands for the time taken for measurement acquisition. The data values for 𝚿, 𝐆, and 𝚪 are documented in the image's EXIF metadata when the image is captured. 16 The surface reflectance, represented as 𝝆, is determined in the post-processing stage of the images taken from the single-band imagers, and this is based on a specific formula provided by Tu et al. (2018). 𝐿 𝜌 = 𝐾𝐸 (3) Where 𝑳 is computed using Equation 1, 𝑬 is computed using Equation 2, and 𝚱 is a normalization constant related to the ratio of the solid angles from the incident light sensor to that of each pixel within the imager. For each flight, the normalization constant 𝚱 is approximated using data from a ground-based, calibrated reflectance target. A Parrot Sequoia Calibration Target (Parrot, Paris, France) with known reflectance values (green: 18.4%, red: 19.7%, red edge: 22.7%, NIR: 27.6%) was positioned near the take-off and landing sites, and images of this target were captured at the beginning and end of each flight. During the processing phase, the Pix4Dmapper photogrammetry and RPAS mapping software, an implementation of the structure-from motion (SfM) algorithm (Turner et al., 2012) identified the calibration target in the imagery, assisting in the estimation of the normalization constant. Given the known reflectance of the calibration target, equations 1, 2, and 3 assist in estimating the normalization constant (𝚱) from the image pixels linked to the calibration target. The RPAS images were processed and combined using the Pix4Dmapper software, which applies the structure-from-motion algorithm. The orthomosaicked images produced by Pix4D are georeferenced using coordinates from the global navigation satellite system (GNSS) embedded in the Sequoia sensor. However, this georeferencing can be regarded as a preliminary or 1st order approximation, as the GNSS positioning system can have errors, potentially up to 10 meters. To enhance accuracy, 6 Ground Control Points (GCPs) per site were employed. This processing created orthomosaicks from the four calibrated single-band sensors including Green, Red, Near Infrared (NIR), and Red-Edge with a ground-resolved distance of 2.9cm. For field surveys, specific locations were located on the orthomosaicks to extract subimages. These locations were identified visually and manually digitized based on markers. The GPS's 60cm accuracy meant visual marker identification was more reliable. Each sub17 image, equivalent to a 1m2 metapixel, represented an area surveyed in the field, containing 1156 pixels (34 rows by 34 columns). Given the variability in spotted knapweed density across the surveyed quadrats and the relatively small number of quadrats surveyed, percent cover of spotted knapweed within each quadrat was categorized into qualitative classes. Quadrats in which spotted knapweed was not present or only present in trace were classified as “None”, quadrats in which spotted knapweed did not exceed 25% cover were classified as “Moderate”, and quadrats in which spotted knapweed exceeded 25% cover were classified as “High”. This categorization was essential for effectively training a classifier because it increased the number of examples representing each class of spotted knapweed cover. The classification was based on ensuring a balance between showing distribution and accounting for lessrepresented spotted knapweed concentrations. The complete dataset consists of 181 measured quadrats, 51 classifieds as None, 63 classifieds as Moderate, and 67 classifieds as High. Subsequently, a comprehensive set of 84 features was extracted for each metapixel. This includes: • Eight features that captured both the mean and standard deviation of reflectance values at the pixel level for every spectral band. • Six features derived from the calculation of the mean and standard deviation for three multiband spectral indices for each pixel within the metapixel. • The remaining 70 features were obtained through GLCM-based texture analysis, yielding valuable texture features for each of the five spectral bands and three multiband indices. For each pixel within the metapixels, three multiband spectral indices were computed. These were derived from the normalized difference vegetation index (NDVI) and calculated by contrasting Near Infrared (NIR) with the remaining spectral reflectance bands. Indices computed using the reflectance values from the red and NIR bands are designated as NDVI (Carlson and Ripley, 1997). Those comparing the green and NIR band reflectance values are denoted as gNDVI. Lastly, indices comparing the red-edge and NIR 18 band reflectance values are labeled as reNDVI. Each index was computed as follows (Baron and Hill, 2020): 𝜌 −𝜌 𝑁𝐷𝑉𝐼 = 𝜌𝑁𝐼𝑅 +𝜌𝑅 𝑁𝐼𝑅 𝑅 𝜌 −𝜌 (4) 𝑔𝑁𝐷𝑉𝐼 = 𝜌𝑁𝐼𝑅 +𝜌𝐺 𝑁𝐼𝑅 𝜌 𝐺 −𝜌 𝑟𝑒𝑁𝐷𝑉𝐼 = 𝜌𝑁𝐼𝑅 +𝜌𝑅𝐸 𝑁𝐼𝑅 𝑅𝐸 (5) (6) Where 𝜌𝐺 , 𝜌𝑅 , 𝜌𝑅𝐸 , and 𝜌𝑁𝐼𝑅 represent the reflectance values of the green, red, red-edge, and NIR spectral bands, respectively. Roadmap of the Thesis In conclusion, Chapter 1 provides an overview of the research problem, the motivation behind the study, and the objectives set to be achieved. Key concepts and theories that form the foundation of the research as well as an orientation to the dataset explored in this work are also introduced. Following chapters will explore the methodologies employed in the study, followed by the presentation of the results obtained. Chapter 2 - Methodology and Experimental Setup: This chapter delves into the research techniques utilized to meet the objectives of this work. Chapter 3 - Results and Discussion: This chapter reports and analyses the research outcomes. It covers the discoveries obtained from employing the methods mentioned in Chapter 2 on the chosen dataset. A thorough analysis of the results is presented, bolstered by visuals and performance metrics. The chapter also integrates these findings and their contribution into the existing body of knowledge regarding RS-based invasive species mapping. Chapter 4 - Recommendations and Future Work: This chapter will encapsulate the recommendations stemming from the research's key findings. It will shed light on 19 actionable suggestions that can be implemented to address current challenges and gaps identified. Moreover, a forward-looking perspective will be provided, discussing potential areas of further research and exploration. 20 Chapter 2 Method Introduction This chapter describes the methodologies employed in this work to map invasive species. Previous work on mapping invasive plants has shown substantial advancements in detection through the use of innovative methodologies for analyzing remote sensing imagery. Baron and Hill (2020) and Kattenborn et al. (2019) both employed RPAS-acquired imagery for assessing woody invasive species in grasslands and forests, respectively. They emphasized the importance of texture analysis in achieving accurate predictions. On the other hand, Michez et al. (2016), Dorigo et al. (2012), and Du et al. (2021) utilized various remote sensing methods for detecting invasive plants, showcasing the effectiveness of integrating spectral, spatial, temporal characteristics, and gray level co-occurrence matrix (GLCM) texture measures. Du et al. (2021) further highlighted the superiority of objectbased analysis over pixel-based methods in classifying wetland plant communities. Recently, for image classification improvement, multi-scale analyses utilizing a Gaussian pyramid (GP) model have been explored to create features at progressively larger spatial scales. In these analyses, a GP model is applied to create features at increasingly larger scales. This approach marks a novel development in the context of invasive species analysis. Several studies have explored the use of GP and GLCM separately for feature extraction and classification. These methods have proven to enhance classification accuracy and are particularly useful in multi-resolution spatial analyses. In the studies by Behrens et al. (2018) and Yin and Cui (2021), GP played a pivotal role. Behrens et al. (2018) utilized lower spatial resolution levels of GP to effectively extract terrain attributes from a digital elevation model (DEM), enabling analysis at various scales. Yin and Cui (2021) developed a multi-scale feature extraction and classification approach for hyperspectral images, integrating GP with weighted voting. By breaking down hyperspectral images into multiple scales with GP and applying weighted coefficients based on spectral angle distance, they significantly improved classification accuracy, underscoring the efficacy of multi-scale analysis. 22 Furthermore, the combination of GLCM-based texture measures with GP-based multiscale feature creation has shown to significantly improve image classification in several fields, including biomedical imaging, as evidenced by Liu et al. (2022) and Ataky et al.(2023), and other image datasets as noted by Roberti de Siqueira et al. (2013). However, the incorporation of GLCMs and GP for feature creation remains unexplored in remote sensing image analysis. In summary, these studies demonstrate the robust versatility of the GP and GLCM in extracting features and performing classification across various domains and image processing applications. The subsequent sections of this chapter delve into the application of machine learning (ML), specifically focusing on the two methods that will be used in this work: random forests (RFs) and support vector machines (SVMs). These ML methods will use to develop models that will predict the amount of spotted knapweed cover based on multiscale GLCM features derived using the GP. The following section will provide an overview of machine learning, followed by an introduction to the RF and SVM models that will be employed. Additionally, we will discuss feature creation using GLCMs and GPs, feature selection, hyperparameter tuning, and other pertinent aspects of RF and SVM model building. Machine Learning: an Overview In the realm of remote sensing imagery, ML has surged to the forefront as an indispensable tool, propelled by the exponential growth of big data technologies and highperformance computing capabilities. These advancements have enabled ML to become a critical asset in deciphering complex, data-rich environmental operations. As Liakos et al. (2018) highlight, a fundamental attribute of ML is its capacity to enable machines to autonomously learn to replicate patterns present in input data, thus circumventing the limitations imposed by traditional programming. This autonomous learning is accomplished by constructing computational models that encapsulate the intricacies of real-world phenomena through the formulation of input-output relationships derived from extensive datasets. Such ML models are particularly skilled at constructing equations to depict these relationships, adeptly handling their potential 23 nonlinearity and discontinuity. This allows them to uncover intricate patterns that are often too complex for closed-form mathematical equations to represent accurately and may remain undetected by traditional non-ML empirical modeling methods, such as regression analysis. ML methods can be categorized into broad groups, distinguished by the nature of the learning involved—be it supervised or unsupervised—as well as by the models and specific methodologies employed, such as classification, regression, clustering, and dimensionality reduction (Alpaydin, 2020). Supervised learning requires a set of examples that demonstrate the input-output pattern to be modeled, whereas unsupervised learning infers the output based on the input data alone. Each example is characterized by a set of features serving as predictors for the desired output. If a supervised learning method is used, the set of outputs (e.g., a class label) corresponding to each example is also provided. Utilizing statistical optimization, the parameters of the ML model are refined to enhance performance in solving the particular problem at hand. This iterative optimization and learning phase, known as "training," utilizes a designated set of examples and outputs (for supervised learning only) called "training data" to guide the model development. Once trained, an ML model can not only be used to predict outcomes for new, unseen examples, but also to facilitates a deeper understanding of the underlying data relationships through the examination of its parameters (Alpaydin, 2020; Baştanlar and Özuysal, 2014). This work will employ two supervised classification methods, namely random forest (RF) and support vector machine (SVM). In addition to the training data, another critical input for ML models is their hyperparameters, which dictate the model's structure and the nuances of its training protocol. These hyperparameters, which are set prior to training, are crucial as they directly affect the model's learning proficiency, effectiveness, and operational efficiency. The efficacy of an ML model depends on various determinants, including the abundance and integrity of the training data, the intricacy of the connections between input and output variables, and practical constraints such as the time and memory resources available for training (Baştanlar and Özuysal, 2014). In this context, the selection and 24 optimization of hyperparameters are paramount, as they play a pivotal role in harmonizing the model's architecture with the complexity of the task to ensure optimal performance. The performance of a machine learning model in a specific task is quantified by evaluating the model using a set of examples and associated outputs known as the validation data. These data are held out from the training process and, thus, can demonstrate if the learned model will generalize to new cases. Several performance metrics have been proposed, and this work will use four popular metrics, namely, accuracy, precision, recall, and the F1 score (Murphy, 2012). ▪ Accuracy: This is the ratio of correct predictions to the total number of predictions. It measures the overall correctness of the model. ▪ Precision is the ratio of correctly predicted positive observations to the total predicted positives. 𝑇𝑃 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃 (7) where: TP is the number of true positives. FP is the number of false positives. ▪ Recall (or Sensitivity or True Positive Rate) is the ratio of correctly predicted positive observations to all actual positives. 𝑇𝑃 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃+𝐹𝑁 (8) where: TP is the number of true positives. FN is the number of false negatives. ▪ F1 Score is the is the harmonic mean of precision and recall. It tries to find the balance between precision and recall. 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗ 𝑅𝑒𝑐𝑎𝑙𝑙 𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 ∗ 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+ 𝑅𝑒𝑐𝑎𝑙𝑙 ▪ (9) Macro Average averages the precision, recall, and F1-score for each class independently, treating every class with equal importance. It gives a sense of the model's overall performance across all classes without taking their distribution into account. ▪ Weighted Average takes the class distribution into account by assigning a weight to each class's performance metrics (such as precision and recall) based on how frequently each class appears in the dataset. This metric can offer a more nuanced 25 view of model performance when the test set mirrors the real-world distribution of classes. Nevertheless, in cases of significant class imbalance, it may over-represent the model's efficacy on the more dominant classes, potentially overestimating the overall performance if the model is better at predicting the majority classes than the minority ones. ▪ Support refers to the number of instances of each class in the dataset, which can directly impact the weighted average and is an important factor to consider when assessing model performance. These metrics are used together because they reveal different aspects of the model performance, especially in scenarios with imbalanced class distributions, as mere accuracy can be misleading in these situations (Murphy, 2012). Data Partitioning: Training and Validation Data Predictive modeling using ML often involves tuning parameters and possibly feature selection to allow accurate replication of the input/output pattern. To determine optimal parameter settings and relevant features, it is vital to use existing data. In machine learning, the comprehensive data set is typically divided into training and validation subsets. The training set aids in model construction, tuning, and feature selection, while the validation set assesses predictive performance (Kuhn and Johnson, 2013). The model's performance is influenced by the distribution of output variables in both training and validation sets. An unbalanced training set, where output variables aren't distributed uniformly, can lead to a model bias. This bias makes the model proficient at predicting dominant outputs but weak at predicting less frequent outputs. Similarly, an unbalanced validation set can skew performance measures, emphasizing the dominant outputs' prediction accuracy. For classification models with discrete output variables, stratified random sampling, as opposed to simple random sampling, is advised for partitioning data to training and validation sets (Kuhn and Johnson, 2013). When aiming for balanced training and validation sets, the least represented class often determines the size of the training data set. To ensure diversity in more frequent classes, resampling or bootstrapping is common, generating multiple training and validation set versions. This 26 technique has been proven to enhance model performance (Kuhn and Johnson, 2013) and will be used in this work. Cross Validation Cross-validation is a strategy to systematically partition a training set. This division allows a segment of the data to be utilized for model training while the remainder assesses the model’s performance, guiding its training by adjusting hyperparameters or halting it if ineffective (Fushiki, 2011). This iterative process ensures all training set examples contribute to both training and model assessment. Beyond mitigating overfitting, crossvalidation-derived performance metrics tend to be more reliable than those computed without this procedure (Wong and Yeh, 2020). K-Fold Cross Validation K-fold cross-validation divides the training set into 'k' equally sized subsets, or folds. Membership in these subsets is determined through random selection. The model is initially trained using all folds except the first, subsequently predicting the outcomes for the excluded fold to evaluate performance. This cycle repeats, with each fold being excluded once. The outcome is 'k' distinct performance estimates, typically presented as mean and standard error values. This aggregated data provides insights into the influence of model hyperparameters (Kuhn and Johnson, 2013). The cross-validation process with k= 3 is depicted in Figure 2. 27 Figure 2: A diagram demonstrating threefold cross-validation. Symbols represent training set samples, divided into three groups. Sequentially, each group is excluded during model training. Performance estimates, such as error rate, are derived for each withheld sample set. Averaging these performance metrics gives the cross-validation estimate of model performance. The decision of the number of folds (k) to use in cross-validation is crucial. While 5 or 10 are standard choices, there's no universal standard. A larger number of folds means the training model uses a sample size nearing the complete training set. For example, when k equals the total training set size (N) - termed "leave-one-out" crossvalidation - the model trains on N-1 samples. One sample is reserved in each training cycle for performance estimation. This approach reduces bias, where "bias" is the difference between estimated and true performance values. Thus, a larger k value, like 10, tends to display less bias than a smaller 'k' value, such as 5 (Fushiki, 2011). In this study, we applied a dual-validation approach to the machine learning model. Initially, a 10-fold cross-validation on the training dataset was used to optimize the model parameters and ensure its robustness. Each fold cyclically trained and tested the model, facilitating parameter refinement. Subsequently, once the model was developed, an independent validation set, separate from the training data set, was used to evaluate the model's capability to generalize and make accurate predictions in unfamiliar scenarios. 28 Recursive Feature Elimination (RFE) Feature Selection involves sequentially assessing each feature in a dataset to identify their effectiveness on the outcome. The goal is to achieve the same or higher accuracy while reducing the dimensionality of data with many features. The Recursive Feature Elimination (RFE) technique evaluates the classifier's performance by systematically removing features. Features that can be removed without resulting in a significant decrease in model performance are considered to be expendable, while features that cannot be removed without resulting in a significant performance decrease are considered to represent the optimal feature set. RFE begins by considering the full set of features, and then calculates the model's performance. It also assesses the significance or ranking of each feature within the classifier. Through iterative cycles, subsets are generated by progressively eliminating features. In each iteration, the classifier is retrained, its performance reassessed, and the importance or rankings of the remaining features are recalculated (Svetnik et al., 2004; Ustebay et al., 2018). Various techniques are often utilized for evaluating the model performance during RFE, including Recursive Feature Elimination with Cross-Validation (RFECV) (Misra and Yadav, 2020), Principal Component Analysis (PCA) (Wibawa and Novianti, 2017), and measures like Information Gain Ratio and Information Gain (IG) (Adi et al., 2019). In RFECV the training data are divided into folds, and RFE is performed on each fold. The crossvalidation scores of the resulting classifiers are averaged, and the number of features selected for the fold that produces the best score is determined to be the optimal number (Nopt) of features. Finally, RFE is performed using the entire training dataset to identify the optimal set of Nopt features. Misra and Yadav (2020) have shown that the RFECV method can enhance the performance of classification algorithms. Additionally, Chang et al. (2019) has demonstrated that employing the RFECV algorithm with 10-fold cross-validation significantly improves the precision of algorithms such as RF and Extreme Gradient Boosting (XGBoost), even with a reduced attribute set. For this study, a two-step recursive feature elimination process was developed using the RFE and RFECV methods in the open-source Python library, SciKit Learn. The first step 29 RFECV was employed construct feature sets of all sizes from 1 to N features, where N is the maximum number of features available. For each feature set size, the cross-validation score was calculated. This process was repeated to account for stochasticity in the foldselection and the training process, and the average cross-validation score was calculated. A plot of the average cross-validation score versus the size of the feature set was plotted, and a visual analysis was conducted to identify the optimal feature set size (Nopt). The feature set size that corresponded to the highest average cross-validation score is selected unless, there are ties. In the case of ties for the highest average cross-validation score, the principle of parsimony was applied, and the smallest feature set size of all of ties was selected. In the second step, REFCV was repeated this time to select which Nopt features to include in the optimal feature set. This selection process was repeated to address stochasticity in the fold selection and model training. Finally, the optimal feature set was defined to be the Nopt features that were selected by the majority of these repeated REFCV runs. Random forest This section delves into the RF algorithm, highlighting the core mechanisms behind the construction of its ensemble of trees. The nuances governing the tree-building process are explored, alongside the essential aspect of hyperparameter tuning in optimizing the RF model's performance. The RF, a supervised machine learning model, is implemented in this study using the open-source Python library, SciKit Learn. An RF comprises an ensemble of decision trees (Breiman, 2001; Mingers, 1989). Decision trees use a hierarchically partition a data from the root node down to the leaf nodes as depicted in Figure 3. 30 Figure 3: Decision tree structure showcasing root nodes, decision nodes, leaf nodes, and branches. Decision trees are differentiated into classification trees, which are designed for qualitative response variables, and regression trees, tailored for quantitative response variables (Strobl et al., 2009). A known characteristic of decision trees is their sensitivity to the training data; even minor changes in the data can significantly alter the tree's structure, potentially leading to poor generalization in the resulting classifier (Sheykhmousa et al., 2020). To enhance generalization, the predictions from multiple trees can be aggregated in an ensemble method. The RF algorithm adopts this strategy by fitting numerous individual trees, often hundreds or thousands, and integrating their predictions to produce a final outcome (Elith, 2019; Pal, 2005; Strobl et al., 2009). The model user must define the number of trees, a hyperparameter, which contributes to mitigating the over-specificity of single trees and leads to more reliable predictions (Hastie et al., 2009). In RF, each tree is constructed from a bootstrap sample of the training data, drawn with replacement, equal in size to the training dataset. On average, these samples contain about 63.2% unique records, the so-called in-bag samples, while the remaining unchosen records, the out-of-bag samples, help estimate the model's error rate (Efron and Tibshirani, 1994). RF differentiates itself from other ensemble methods such as bagging 31 by introducing randomness in the selection of predictors at each split, evaluating only a subset of predictors to find the optimal one. This strategy generates decorrelated trees and reduces overfitting risks (Breiman, 2001; Elith, 2019; Strobl et al., 2009). RF typically constructs "deep" trees, unpruned with many splits, which may lead to terminal nodes that have few data points. The minimum sample size for terminal nodes and the tree depth are adjustable through hyperparameters, which vary in their specific control features according to the software implementation (Elith, 2019; Strobl et al., 2009; Valavi et al., 2021). Once trained, the RF model applies its collective knowledge to new, unlabeled input data. Each decision tree in the ensemble contributes a vote towards a class membership for each data point. The class that garners the majority of votes is assigned as the prediction for the given input (Breiman, 2001; Klusowski, 2018). Optimizing RF performance: Hyperparameters Tuning The RF is a flexible and potent algorithm but demands careful calibration of various hyperparameters to deliver optimal performance. The appropriate choice of hyperparameters can significantly affect the accuracy, generalizability, and efficiency of the RF model. These include the number of observations selected at random for each tree and whether they are drawn with or without replacement, the count of variables chosen randomly for each split, the criterion for splitting, the minimum required samples within a node, and the total number of trees in the ensemble (Probst et al., 2019). A significant part of executing the RF model is the configuration of its two primary hyperparameters: the number of trees (Ntree), and the number of randomly selected features to be considered at each split (Mtry). The number of trees in a RF impacts the model’s ability to represent patterns characterized by a large number of training data or a large number of features in each training example. As the training set size or number of features increases, so too should the number of trees in the RF. Each tree in the RF has the potential to use any of the predictive features for classifying an example. However, each split in the tree will be selected using a unique subset of these predictors. The size of this subset is defined Mtry parameter. Berhane et al. (2018) pointed out, while the RF 32 model's behavior is generally resilient to changes in Ntree, it can be more sensitive to variations in Mtry. Reducing the Mtry parameter can lead to quicker training, but it also diminishes both the correlation between any two trees and the individual strength of each tree in the forest. Consequently, the value of this hyperparamter has a multifaceted impact on classification accuracy (Klusowski, 2018). Given that the RF classifier is computationally efficient and tends not to overfit, it can accommodate a very large number of trees (Ntree) (Guan et al., 2013). However, numerous studies have identified 500 as an optimal number for Ntree, as further increases did not yield improvements in accuracy (Belgiu and Drăguţ, 2016). In contrast, the ideal value for Mtry depends on the specific dataset. For classification tasks, it is advised to set the Mtry parameter to the square root of the number of input features (Breiman, 2001). This study uses the RF implementation in the Python’s scikit-learn library for Python. In this RF implementation, the hyperparameters Mtry and Ntree are named max_features and n_estimators, respectively. The complete set of hyperparameters in the scikit-learn RF implementation are: • Max_depth: Each tree's depth in the forest determines its complexity. By setting an explicit max depth, we can prevent the tree from growing endlessly. An overly deep tree might lead to overfitting; conversely, a shallow tree might underfit the data. • Min_samples_split: This refers to the minimum count of data points placed in a node before the node is split. For instance, if the value is set to 10, a split will be attempted only if it contains at least 10 data points. • Min_samples_leaf: After a successful split, nodes might contain a small count of data. If this count is lesser than 'min_samples_leaf,' then the split will be deemed unsuccessful. This is a regularization hyperparameter that helps in avoiding overly specific leaves in the tree. • Bootstrap: A core concept in the RF model, bootstrapping involves training each tree on a distinct subset of data. This subset, known as the "bag," is sampled with replacement from the entire dataset. The data points not in the bag are called 33 "Out of Bag (OOB) samples." Aggregating the outputs from multiple such diversified trees helps in reducing the model's variance and increases its robustness (Kelkar and Bakal, 2020). These hyperparameters are critical to RF modeling and will be tuned to find the optimal set of hyperparameters for a given dataset (Probst et al., 2019). Support Vector Machine (SVM) First proposed by Vapnik and his team in the late 1970s, the SVM, a supervised nonparametric statistical learning technique, has become one of the most prevalent kernelbased learning methods in diverse machine learning tasks, notably in image classification (Vapnik, 2006). Fundamentally SVM is a linear binary classifier that delineates a singular boundary between two categories. This linear SVM presupposes that the multi-faceted data can be linearly divided in the input domain (as illustrated in Fig. 2-3). Specifically, SVMs determine an optimal hyperplane (i.e., a surface in the dimensional space defined by the input variables) to divide the dataset into specific pre-established classes, based on training data. To ensure the largest separation or margin, SVMs utilize a subset of the training data that is nearest in the feature space to the best optimal boundary, known as the support vectors (Foody and Mathur, 2004). The optimal boundary, often termed the "maximal margin" or "optimal hyperplane," is a key aspect of SVM. It represents a decision-making border designed to minimize errors when categorizing data during the training phase (Mountrakis et al., 2011). Referring to Figure 4, multiple hyperplanes are chosen such that no data samples lie between them. The best hyperplane is identified by maximizing the separation distance between these hyperplanes. This systematic approach is termed the learning process. 34 Support vectors Support vectors Figure 4: Illustration of an SVM classifier with a linear kernel applied to linearly separable data, highlighting the optimal separating hyperplane (solid line) and the margins (dashed lines) defined by the support vectors (circled points). In real-world scenarios, data points from different classes may not always be cleanly separated, leading to overlaps, as depicted in Figure 5. Recognizing these challenges, Cortes and Vapnik (1995) introduced significant enhancements to SVM, notably the soft margin and the kernel trick. The soft margin method introduces slack variables to the SVM optimization process, allowing some linear data separation. Concurrently, the kernel trick aims to transform the original data into a higher-dimensional space, making previously overlapping samples more distinct, as shown in Figure 6. The efficacy of SVM largely depends on the right choice of kernel function. Commonly used kernel functions include the Sigmoid, Radial basis function, Polynomial, and Linear models (Cherkassky and Ma, 2004). Specifically, the polynomial and radial basis function (RBF) kernels are frequently used in analyzing remotely sensed images (Mountrakis et al., 2011). 35 Figure 5: Visualization of a non-linear SVM classification, showing how a non-linear boundary effectively separates two distinct classes in the feature space. Figure 6:Illustration of the kernel trick applied to data in a 3D space, demonstrating the transformation that facilitates the separation of classes which are not linearly separable in the original dimensions. 36 Optimizing SVM Performance: Hyperparameters Tuning Parameters, including the kernel's parameters C, 𝛾, the degree, and the kernel function itself, require meticulous optimization. • C (Cost parameter): C is a regularization parameter in SVM. It determines the trade-off between achieving a low error on the training data and maintaining a wide margin between classes. A small value of C creates a wider margin, which may result in more training errors. A large value of C aims for a smaller margin and fewer training errors. However, setting C too high might make the model overfit to the training data, reducing its ability to generalize to unseen data. • 𝜸 (Gamma): 𝛾 is a parameter specific to the Radial Basis Function (RBF) kernel. It determines how far the influence of a single training sample reaches, implying how closely the model will fit to the training data. A low 𝛾 value makes the model more flexible by considering a broader range of influence for each training sample, producing a more generalized solution. A high 𝛾 value considers only close points, producing a more fitted solution, but with a risk of overfitting. • Degree: This parameter is specific to the polynomial kernel in SVM. It sets the degree of the polynomial function used, altering the complexity of the decision boundary. A higher degree polynomial can capture more complex relationships in the data but increases the risk of overfitting if the complexity is not warranted by the data structure. These hyperparameters are pivotal in defining the decision boundaries of SVM classifiers and are essential in the model's capacity for generalization. The careful tuning and in-depth understanding of these parameters are critical, as they have a profound effect on the SVM's performance with both training and unseen datasets. This research incorporates the aforementioned hyperparameters for model tuning, with detailed values and results to be presented in Chapter 3. 37 Optimizing Machine Learning Performance: Grid Search To ensure the highest performance of our machine learning models, it is crucial to find the right combination of hyperparameters. In this work uses a grid search method, as detailed by Ataei and Osanloo (2004) and Probst et al. (2019). Hyperparameter tuning via grid search is an exhaustive method that systematically evaluates the training performance of the machine learning model across every possible combination of hyperparameter values provided. The combination of hyperparameters that yields the best training performance is selected as the optimal set for the model (Ataei and Osanloo, 2004). Although this meticulous approach is time consuming, its simplicity to implement, and exhaustive nature makes it a preferred choice for many researchers and practitioners. This is because it provides confidence that the selected hyperparameters are, indeed, the best among the provided set, leading to reliable and robust machine learning model performance (Probst et al., 2019) Gray Level Co-occurrence Matrix (GLCM) Based Texture Analysis There are many methods proposed for extracting textural features in texture analysis. One such method, which is utilized in this research, is the gray level cooccurrence matrix (GLCM) (Roberti de Siqueira et al., 2013). In image processing, especially when delving into the intricate domain of feature extraction, understanding the GLCM becomes crucial. This technique has formed the backbone of many significant advancements in this field (Hall-Beyer, 2017; Mohanaiah et al., 2013; Öztürk and Akdemir, 2018; Xian, 2010; Zulpe and Pawar, 2012). At the most basic level, a GLCM encapsulates spatial patterns within a black-andwhite image. These spatial patterns can be summarized using statistics to create textural features that can aid in image analysis, such as the classification of multispectral images. Haralick et al. (1973) introduced GLCM-based texture features, and since then, they've found use in many remote sensing image analyses, especially for detecting invasive plant species (Baron et al., 2018; Baron and Hill, 2020; Dorigo et al., 2012; Li et al., 2019; Pearlstine et al., 2005). 38 A GLCM is a square matrix that provides insights into the spatial distribution of gray levels within an image. The term gray-level refers to the pixel value in a single-spectralband image (i.e., a black-and-white image). The number of rows and columns in a GLCM is defined by the number of gray levels in the image. Thus, an image where the gray tone is encoded as an unsigned integer will have 256 rows and 256 columns. Each element of a GLCM represents the relative frequency of occurrences of two pixels, one with gray-tone 𝑖 and the other with gray-tone 𝑗, separated by a distance 𝑑 and oriented according to the angular relationship (𝑎) to each other. Thus, the co-occurrence pattern, captured by the GLCM, is dependent on both the separation distance (𝑑) and angular relationship (𝑎) between the pixels. The angular relationship between pixels can be defined by four different directions across an image: horizontal, vertical, left diagonal, and right diagonal, as illustrated in Figure 7. Because these angular relationships are reciprocal, the resultant GLCM is symmetric. This means patterns observed in one direction (e.g., up) are similar to its counterpart (e.g. down) forming a harmonized relationship, whether horizontal, vertical, or diagonal. Figure 7: Illustration of the four primary directions (0°, 45°, 90°, and 135°) used in the computation of Haralick texture features for GLCM analysis, depicting the relative positioning of pixel pairs. Within this framework, various textural features emerge, and this study predominantly focuses on five: Angular Second Moment (ASM), Correlation (COR), Entropy (ENT), Sum Entropy (SENT), and Difference Entropy (DENT). These features were selected because they are invariant to gray-tone transformations (Gonzalez et al., 2008; Haralick et al., 1973), and thus, are expected to be less sensitive to changes in illumination (e.g. shadows) and calibration errors. 39 • ASM quantifies the uniformity of the distribution of gray levels separated by a distance 𝑑 in the direction 𝑎 in the image. • COR is a measure that indicates the presence of linear dependencies between gray levels separated by a distance 𝑑 in the direction 𝑎 in an image. It provides insights into the relationship between the rows or columns of the GLCM and their degree of association with each other (Conners and Harlow, 1980; Kekre et al., 2010; Tahir et al., 2003). • ENT is calculated by assessing the probability of occurrence of a pixel with a certain intensity next to a pixel with another intensity. A high entropy value from the GLCM indicates a high degree of complexity and variability in the image texture, suggesting a less uniform and more detailed pattern. Conversely, a low entropy value implies more homogeneity and less detail, reflecting a more predictable texture pattern (Baraldi and Panniggiani, 1995; Haralick et al., 1973; Haralick and Shanmugam, 1973). • SENT assesses the complexity in an image by examining the sum of intensities of pixel pairs. Higher values indicate more complexity or texture information in the image (Haralick et al., 1973). • DENT is a measure of the variability in the differences between the gray levels of pixel pairs. Instead of looking at how often pairs of gray levels occur together, as with traditional GLCM entries, Difference Entropy examines the frequency distribution of the absolute differences between the gray levels of each pixel pair. It calculates the entropy of this difference distribution, capturing the texture's contrast. A high Difference Entropy value indicates a greater complexity or variability in texture contrast, while a low value suggests less contrast and more uniformity in the texture of the image (Haralick et al., 1973). This analytical method allows for a deeper understanding of image content, facilitating advancements in areas like medical imaging, remote sensing, and even automated quality inspection in manufacturing (Roberti de Siqueira et al., 2013). 40 Mathematical Representation of GLCM-based Features These features can be calculated using the equations below: Distribution of Gray Levels along the Horizontal Axis: 𝑁 𝑔 𝑝𝑥 (𝑖) = ∑𝑗=1 𝑝(𝑖 𝑗) (10) where 𝑖 is a horizontal position in the image. 𝑗. is a particular gray level intensity value. Ng is the number of possible gray level intensities. 𝑝(𝑖 𝑗) represents the probability of finding a pixel with gray level 𝑗. at the horizontal position 𝑖 . 𝑝𝑥 (𝑖) represents an average gray level intensity at horizontal position 𝑖, considering all vertical positions in the image. Mean of 𝒑𝒙: 1 𝑁 𝑔 𝜇𝑥 = 𝑁 ∑𝑖=1 𝑝𝑥 (𝑖) 𝑔 (11) Standard deviation of 𝒑𝒙 𝑁 1 𝑔 (𝑝𝑥 (𝑖) − 𝜇𝑥 )2 𝜎𝑥 = √𝑁 −1 ∑𝑖=1 𝑔 (12) Distribution of Gray Levels along the Vertical Axis 𝑁 𝑔 𝑝𝑦 (𝑗) = ∑𝑖=1 𝑝(𝑖 𝑗) (13) where 𝑝𝑦 (𝑗) represents an average gray level intensity at vertical position 𝑗, considering all horizontal positions in the image. Mean of 𝒑𝒚 : 𝜇𝑦 = 1 𝑁𝑔 𝑁 𝑔 ∑𝑗=1 𝑝𝑦 (𝑗) (14) Because of the assumption of symmetry in the angular relationships considered in this work, 𝜇𝑥 = 𝜇𝑦 . 41 Standard deviation of 𝒑𝒙: 2 𝑁 1 𝑔 𝜎𝑦 = √𝑁 −1 ∑𝑖=1(𝑝𝑦 (𝑖 ) − 𝜇𝑦 ) 𝑔 (15) Because of the assumption of symmetry in the angular relationships considered in this work, 𝜎𝑥 = 𝜎𝑦 . Distribution of Gray Level Sum: 𝑁 𝑁 𝑔 𝑔 ∑𝑗=1 𝑝𝑥+𝑦 (𝑘) = ∑𝑖=1 𝑝(𝑖 𝑗) 𝑘=𝑖+𝑗 𝑘 = 2,3, … ,2𝑁𝑔 (16) 𝑝𝑥+𝑦 (𝑘) represents the probability distribution of the sum of gray level intensities of pixels at positions 𝑖 and 𝑗, and signifies the likelihood of encountering a specific sum of gray levels in the image. Distribution of Gray Level Difference: 𝑁 𝑁 𝑔 𝑔 ∑𝑗=1 𝑝𝑥−𝑦 (𝑘) = ∑𝑖=1 𝑝(𝑖 𝑗); 𝑘 = |𝑖 − 𝑗| 𝑘 = 0,1, … , 𝑁𝑔 − 1 (17) 𝑝𝑥−𝑦 (𝑘) represents the probability distribution of the difference in gray level intensities of pixels at position 𝑖 and 𝑗. Angular Second Moment (ASM): 𝑁 𝑁 𝑔 𝑔 ∑𝑗=1 {𝑝(𝑖 𝑗)}2 𝐴𝑆𝑀 = ∑𝑖=1 (18) where 𝑝(𝑖 𝑗) represents the probability (or normalized frequency) that a pixel with gray level 𝑖 is adjacent to a pixel with gray level 𝑗 in a particular direction (e.g., horizontal, vertical, diagonal). Squaring the probabilities emphasizes higher probabilities, which, in the context of the ASM, it gives a measure of uniformity. Correlation (COR): 42 𝑁 𝐶𝑂𝑅 = 𝑁 𝑔 𝑔 ∑𝑖=1 ∑𝑗=1 (𝑖∗ 𝑗)𝑝(𝑖,𝑗)−𝜇𝑥 𝜇𝑦 𝜎𝑥 𝜎𝑦 (19) Entropy (ENT): 𝑁 𝑁 𝑔 𝑔 ∑𝑗=1 𝐸𝑁𝑇 = − ∑𝑖=1 𝑝(𝑖 𝑗)𝑙𝑜𝑔2 [𝑝(𝑖 𝑗) + 𝜀] (20) where 𝜀 is a small constant value to ensure that the logarithm is defined (preventing log of zero, which is undefined), which ensures stability in the computation, especially for cases where 𝑝(𝑖 𝑗) might be zero. Sum Entropy (SENT): 2𝑁𝑔 𝑆𝐸𝑁𝑇 = − ∑𝑖=2 𝑝𝑥+𝑦 (𝑖)𝑙𝑜𝑔2 [𝑝𝑥+𝑦 (𝑖) + 𝜀] (21) where 𝜀 is a small constant value to ensure that the logarithm is defined (preventing log of zero, which is undefined), which ensures stability in the computation, especially for cases where 𝑝(𝑖 𝑗) might be zero. Difference Entropy (DENT): 𝑁𝑔 −1 𝐷𝐸𝑁𝑇 = − ∑𝑖=0 𝑝𝑥−𝑦 (𝑖)𝑙𝑜𝑔2 [𝑝𝑥−𝑦 (𝑖) + 𝜀] (22) where 𝜀 is a small constant value to ensure that the logarithm is defined (preventing log of zero, which is undefined), which ensures stability in the computation, especially for cases where 𝑝(𝑖 𝑗) might be zero. In the scope of this research, the focus remains on a separation distance 𝑑 = 1 for deriving textural features. To represent these directional features in an all-encompassing, rotation-invariant manner, each of the four directional (vertical, horizontal, diagonal-up, diagonal-down) variations is summarized using mean and range statistics (Sebastian et al., 2012). Introduction of Scale-Space Theory and the Gaussian Pyramid Scale-space theory stands as a fundamental pillar in the realm of image processing, enabling a multi-scale representation of images, particularly when interpreting multi- 43 spectral imagery using a Gaussian pyramid (GP). The essence of this research revolves around understanding how multi-scale attributes play a pivotal role in image classification. Importance of Multi-scale Analysis Objects and landscapes present in our environment inherently possess multi-scale characteristics. Depending on the observation scale, these objects might exhibit varying appearances. Similarly, biological vision systems display different levels of visual processing, each corresponding to a specific scale of information. As automated algorithms for image interpretation in novel scenes evolve, the challenge of determining relevant scales, without prior knowledge, becomes paramount. To address this, scalespace theory advocates for simultaneous representation across all scales, offering a structured methodology to represent an image through a series of smoothed or blurred versions spanning different scales (Florack et al., 1992; Koenderink, 1984; Lindeberg, 2009, 1995, 1990; Romeny, 2008; Witkin, 1983). Axiomatic Derivations and the Gaussian Approach A core tenet of scale-space theory, based on axiomatic derivations, posits that representations at coarser scales should ideally be simplified versions of their finer-scale counterparts (Lindeberg, 2013). Such a guideline naturally suggests a specific class of image operators: convolution using Gaussian kernels and their derivatives. These operators not only capture varying scale information but also retain pertinent image structures (Lindeberg, 2020; Mikolajczyk, 2002). The Gaussian-based approach, thus, emerges as an efficient and robust technique for a wide array of visual processing endeavors. Its application ranges from feature detection, classification, image-based pattern recognition, and image segmentation to enhancement via deblurring (Kuijper et al., 2003). With its foundations deeply embedded in both physics and biological vision, scalespace theory provides a cohesive methodology for computer vision. By offering a systematic, multi-scale analysis tool, this theory has gained traction and widespread application in various computer vision tasks. As technology continues to evolve, the principles of scale-space theory will undoubtedly remain integral in shaping future 44 developments in the field of image processing and interpretation (Chomat et al., 2001; Florack, 1997; Henkel, 1995; Hummel et al., 1987; Kalitzin et al., 1997; Réti, 1995). Gaussian Convolution and Scale-Space Gaussian convolution plays a central role in the creation of scale-space representations by enabling the construction of a set of multi-scale images that capture image structures at various levels of granularity. The process involves applying a Gaussian kernel to the original image at different scales. This results in a series of increasingly smoothed versions of the image, each representing a different level of blurriness (Kuijper et al., 2003). The two-dimensional Gaussian function used to define a kernel is: 1 𝑔 (𝑥, 𝑦; 𝜎, 𝑥 , 𝑦 ) = 2𝜋𝜎2 𝑒𝑥𝑝 (− 2 2 (𝑥−𝑥 ) + (𝑦−𝑦) ) 2𝜎2 (23) where, 𝜎 is the standard deviation of the Gaussian function, 𝑥 is the x-coordinate of the center of the Gaussian function. 𝑦 is the y-coordinate of the center of the Gaussian function and 1 / (2πσ2) is the normalization factor to ensure that the integral of the 2 2 Gaussian function over the entire domain is equal to 1, (𝑥 − 𝑥 ) + (𝑦 − 𝑦 ) is the squared distance from the origin, and the negative sign ensures that the function decreases as the distance from the origin increases. In Figure 8, a 2D Gaussian distribution is depicted with 𝑥 = 0, 𝑦 = 0, , and σ=1. 45 Figure 8: 3D representation of a 2D uniform Gaussian kernel, visualizing the symmetrical distribution and peak concentration at the origin. The family of Gaussian kernels has several properties that facilitate its use for data smoothing, namely, linearity, separability, causality, and the semi-group property (Florack et al., 1992; Koenderink, 1984; Lindeberg, 1995; Sporring et al., 2013; Witkin, 1983). Of particular relevance is the property of separability, which allows an n-dimensional Gaussian kernel to be derived as the product of n one-dimensional kernels (Mikolajczyk, 2002). This property can be represented mathematically as: 𝑔(𝑥𝑦) = 𝑔(𝑥)𝑔(𝑦) 1 where, g(x) =2𝜋𝜎2 𝑒𝑥𝑝 (24) − (𝑥−𝑥 ) 2𝜎2 2 1 and g(y) =2𝜋𝜎2 𝑒𝑥𝑝 − (𝑦−𝑦) 2𝜎2 The separability property simplifies the computational complexity of convolutions and contributes to efficient image processing algorithms because it allows the smoothing of an image to be accomplished through two separate one-dimensional smoothing steps, each applied to one dimension of the image. Typically, the creation of different levels in the scale-space representation involves convolving the image with the Gaussian kernel. 𝐿(𝑝𝜎) = 𝑔(𝜎) ∗ 𝐼(𝑝) (25) where ∗ is the convolution, with 𝐼 the image and 𝑝 = (𝑥𝑦) the point location. The Gaussian kernel, 𝑔 , is characterized by circularly symmetric and is parameterized by a 46 single scale factor, denoted as σ. Using the separability property of Gaussian kernels permits a two-dimensional Gaussian kernel to be decomposed into two orthogonal, onedimensional filters, which significantly reduces the computational complexity (Mikolajczyk, 2002). Furthermore, the implementation of a one-dimensional Gaussian kernel can be achieved using a recursive filter (Deriche, 1993). This recursive approach offers notable computational efficiency, particularly when dealing with larger Gaussian kernels (e.g., kernels which operate on a wide areas of neighbouring pixels) (Mikolajczyk, 2002). Gaussian Pyramids The GP is a foundational tool for creating multi-scale representations of images which has been widely used in the realm of image processing and computer vision (Haddad and Akansu, 1991; Konlambigue et al., 2018; Li et al., 2018; Olkkonen and Pesola, 1996, 1996; Sporring et al., 2013). These pyramids have been hailed for their efficiency and wide applicability in various tasks like image compression, segmentation, and object detection (Adelson et al., 1984; Mpinda Ataky et al., 2020; Olkkonen and Pesola, 1996). The primary goal of a Gaussian pyramid is to facilitate the analysis of images at different resolutions. This is achieved by constructing copies of an image at different levels of detail and scale. The pyramid itself is constructed by stacking images with varied resolutions: starting with the original image at the base and progressively scaling it down to the top, which becomes a single-pixel representation indicating the average value of the entire image (see Figure 9). This layered representation not only reduces noise in the image as pyramid levels increase but also enhances its smoothness, making it invaluable for various image processing applications (Li et al., 2018). Constructing such a pyramid involves a sequence of operations. Initially, the image undergoes convolution with a Gaussian kernel, which, is typically centered around the center of a pixel or a group of pixels. The standard deviation of the kernel dictates the degree of image blurring (Binaghi et al., 2003; Chaudhuri and Marron, 2000). Postblurring, the image is then subjected to downsampling. Here, every 2x2 block of pixels in the image is averaged, but this doesn’t alter the image’s spatial extent. Instead, this 47 method decreases the image’s resolution by creating pixels with spatial footprints that are four times larger. The culmination of this process is the halving of the image’s size, transitioning an M×N image to an M/2×N/2 version, thus cutting down its area to onefourth of its original size, a process often termed as an octave or level (Li et al., 2018). This process is visually depicted in Figure 9, illustrating the progression from the original image at l0 to the fourth level of the Gaussian Pyramid (l5). l4 l3 l2 l1 l0 Figure 9: Gaussian Pyramid. The image illustrates five levels of the GP, spanning from the original image at level l0 to the fifth level, l4. 48 As depicted in Figure 9, images of varying resolutions can be visualized as a stacked structure, called a pyramid (Adelson et al., 1984). At the base of this structure is the original image (G0), which has the highest spatial resolution of all the images in the pyramid. Whereas the lowest resolution image (GN) is found at the apex of the pyramid. Each image (Gl) in the pyramid is referred to as inhabiting a level (l) in the pyramid, where the zeroth level corresponds to the original image at the base of the pyramid. As the level in the pyramid increases, the image resolution decreases, culminating in the Nth level, which corresponds to the lowest resolution image at the apex. To create the image at the l-th level in the pyramid, the image at the (𝑙 − 1)-th level is convolved with the Gaussian kernel and the resulting blurred image is then resized. Thus, Gl(x, y) is an evolution of Gl1(x, y), each step seamlessly transforming the image’s resolution. 𝐺𝑙 (𝑥, 𝑦) = ∑𝑇𝑚= −1 ∑𝑇𝑛= −1 𝑤(𝑚,  𝑛) ∗ 𝐺𝑙−1 (2𝑅𝑥 𝑥 + 𝑚,  2𝑅𝑦 𝑦 + 𝑛) (26) In this equation: • 𝐺𝑙 (𝑥, 𝑦) signifies the pixel value at location (x, y) in the l-th level of the Gaussian pyramid. • 𝐺𝑙−1 (2𝑅𝑥 𝑥 + 𝑚, 2𝑅𝑦 𝑦 + 𝑛) represents the pixel value from the previous level (l1) of the Gaussian pyramid. The terms 2𝑅𝑥 𝑥 + 𝑚 and 2𝑅𝑦 𝑦 + 𝑛 adjust coordinates for downsampling with 𝑅𝑥 and 𝑅𝑦 being the respective scaling factors for the 𝑥 and 𝑦 axes. • The weight w(m, n) is derived from a predefined Gaussian kernel, emphasizing the importance of each pixel’s proximity. It functions as a smoothing filter, reducing rapid pixel value changes across the image. When convolved with an image, it averages pixel values within its domain, generating a blurring effect. • T determines the limit or the radius of the kernel in each direction. If 𝑚 and 𝑛 range from -T to T, the kernel would be of size (2T+1)x(2T+1). The symbol ∗ is the convolution operation, while the downsampling ratios in the x and y directions are denoted as Rx and Ry, respectively. 49 Digital images discretize space into pixels, thus, to create a GP, the Gaussian kernel must be discretized. The Gaussian kernel used in this work is represented as (Bradski, 2000): 1 4 1⁄ 256 6 4 [1 4 6 4 1 16 24 16 4 24 36 24 6 16 24 16 4 4 6 4 1] Each value in the matrix corresponds to a weight, and when this kernel is convolved with an image, it gives a weighted average of the pixel values in the neighborhood defined by the kernel. This kernel gives more weight to the center pixels and increasingly less weight is given as the distance from the kernel center increases, leading to a Gaussian or bell-curve distribution of weights. The factor 1⁄256 normalizes the kernel, ensuring that the sum of all weights is 1. Figure 2-6 shows an example of a Gaussian pyramid constructed from a single spectral-band (green) image, which was captured by an RPAS at the Laurie Guichon Memorial Grasslands Interpretive Site. Level 0 (l0) in this pyramid corresponds to the original grey-scale image. The subsequent levels are constructed by applying equation 24 and downsampling, which involves reducing the number of pixels by removing even rows and columns. Thus, to create the first layer, l1, the Gaussian kernel is applied to the original image (l0). The result is then downsampled so that each pixel represents an area 4x larger on the ground. Here, l is an element of the set {0,1,2,3, …, N}, where l represents the number of layers in the Gaussian pyramid. Conclusion In this chapter, the methods that are central to the research have been discussed. Moving forward, the approach will involve using RF and SVM classifiers, both of which are trained using a 10-fold cross-validation method. Hyperparameters for these models will be identified using grid-search. To enhance the spectral information present in the RPAS-acquired imagery, GLCM texture features will be created. Additionally, GPs will 50 be employed to achieve a multi-scale representation of the images, enabling a thorough analysis across different scales. Features will be selected from these GP representations of the RPAS-acquired imagery using the two-step RFECV method described above. The subsequent chapter will detail how these methods are applied to map spotted knapweed in a grassland ecosystem. 51 Chapter 3 Invasive Species Mapping: A Case Study on Spotted Knapweed Detection in Grassland Ecosystems 52 Introduction The primary objective of this work is to evaluate the impact of image spatial resolution when mapping spotted knapweed in a grassland ecosystem. The data used for this study were collected using a consumer-grade remotely piloted aircraft system (DJI Phantom 4) and multispectral imager (Parrot Sequoia). Processing the raw imagery using Pix4D Pro (Pix4Dmapper, 2018) resulted in four images with a spatial resolution of 2.9 cm and representing the green, red, red-edge, and near-infrared spectral bands. From these images, three vegetation index (NDVI, reNDVI, and gNDVI) images (also with a 2.9 cm spatial resolution) were created. These datasets are described in greater detail in Section 1.8 of this thesis and in Baron (2020). This chapter will discuss how Gaussian pyramids (GPs) were used to create a multiscale representation of these seven images, how features were extracted from these images, and how these features were used to create machine-learning classifiers using both random forests (RF) and support vector machines (SVM). The outcome of classifiers built with features calculated from different spatial resolution images will be analyzed to reveal scale dependencies in feature sets. Finally, classifiers built using features from all spatial scales will be compared to evaluate the importance of multiscale feature sets in this classification task. A key aspect of this discussion is the comparison of the data processing framework developed in this work, which includes scale optimization, with the framework employed in previous work (Baron and Hill, 2020), highlighting the intention of this study to evaluate scale-space feature analysis. Methods All analyses were conducted using PyCharm 2023.2.2, an Integrated Development Environment (IDE) specifically for the Python programming language, which facilitated the implementation of image processing through the Scikit-Image 0.20.0 libraries (Van der Walt et al., 2014) and machine learning, specifically Random Forest (RF) and Support Vector Machine (SVM), via the Scikit-Learn 1.2.2 libraries (Pedregosa et al., 2011). GP images were created using the OpenCV library 4.5.4.58 (Bradski, 2000). Data management and analysis were also performed with Pandas 1.4.0, an open-source Python library 53 renowned for its high-performance and user-friendly data structures and data analysis tools (McKinney et al., 2010). Numerical computations, particularly on arrays and matrices, were handled by NumPy 1.21.4, a Python module known for its rapid computational capabilities (Harris et al., 2020). The Mahotas library, which is dedicated to computer vision and image processing, was utilized to compute the Gray Level Cooccurrence Matrix (GLCM) (Coelho, 2013). Finally, visualization and mapping of the results were done using ArcMap 10.7.2. Feature Extraction Gaussian Pyramids The GP of an image consists of a series of increasingly blurred images constructed from the original image. The original image constitutes level 0 in the image pyramid, and at each higher level in the pyramid, the image size resolution decreases while the degree of blurring increases. This operation was performed four times to generate four different levels of the Gaussian pyramid, starting at level-0 (l0), which constitutes the original image (40x40 pixels), to level-4 (l4), the fifth level of the GP, which has 3x3 pixels. Figure 10 shows the Gaussian pyramid constructed for a green band image used in this study. The GP method was applied to the four spectral bands (Green, Red, NIR, Red-edge) and three multiband vegetation indexes (NDVI, gNDVI, reNDVI) images using the OpenCV library (Bradski, 2000). This created a set of 7 image GPs, with 5 levels each. Level-0 of each of these GPs is populated by the spectral image produced from the RPAS-acquired data by Pix4D (or a multiband vegetation index based on these data), and has a ground sampling distance (e.g., the on-the-ground footprint of a pixel) of 2.9 cm. For each successive level in the pyramid, the GSD increases by a factor of 2. The GSD at each level in the constructed pyramids is listed in Table 3. 54 l4 l3 l2 l1 l0 Figure 10: Gaussian Pyramid. The image illustrates five levels of the GP, spanning from the original image at level l0 to the fifth level, l4. Note the extent of the physical domain represented by each image stays the same despite the reduction in the number of number of pixels constituting the image. Table 3: Relation between each level of GP and GSD. GP Level Ground Sample Distance (GSD) Level 1 2.9cm×2= 5.8 cm Level 2 5.8 cm×2= 11.6 cm Level 3 11.6 cm×2= 23.2 cm Level 4 23.2 cm×2= 46.4 cm 55 GLCM Features This work will use meta-pixel-based image analysis, as presented by Baron and Hill (2019) to identify the relative abundance of spotted knapweed in the study area. In this method, a chessboard segmentation is applied to divide each image into a set of nonoverlapping squares, called metapixels, which are the same size as a field-survey quadrat (i.e., 1m2). Features to describe these metapixels are then calculated from the image pixels within the metapixel boundaries. In this work, 12 features were extracted from each metapixel are listed in Table 4 and include the mean and standard deviation of the pixel values and ten GLCM-based texture features. These 12 features were extracted for each metapixel in each of the 5 GPs, resulting in a set of 84 features describing metapixel per level in the GPs. Across all 5 levels in the GPs, this amounts to a set of 420 features per metapixel. Table 4: GLCM Extracted Features. Extracted Features Mean of a pixel value Standard deviation of a pixel value Mean ASM Range ASM Mean Entropy Range Entropy Mean Sum Entropy Range Sum Entropy Mean Difference Entropy Range Difference Entropy Mean Correlation Range Correlation 56 Feature Compilation Following the work of Baron and Hill (2020), the relative abundance of spotted knapweed within each surveyed quadrat was represented as a qualitative class rather than a quantitative proportion. This was done to ensure that there were sufficient examples in the dataset of each abundance of spotted knapweed to train a classifier. Three non-overlapping classes, “None”, “Moderate” and “High”, were defined. The definitions of these three classes, as well as the number of cases in the dataset of each class can be found in Table 5. For more information, see Baron et al. (2020) and Baron (2020) or section 1.8 of this thesis. Table 5: Classification of Spotted Knapweed Abundance in Surveyed Sites. Abundance of Knapweed Number of Cases Qualitative Class Either absent or present in trace amounts 51 None No more than 25% cover 63 Moderate Exceeding 25% 67 High The datasets for training and validating the classifiers in this study were generated by extracting 420 multi-scale features from metapixels delineated by the boundaries of 93 survey quadrats across three sites (31 quadrats were measured at each field site). These data were then split into two distinct subsets: a training set and a validation set. The validation set consisted solely of data collected on July 4, 2018, from the 31 quadrats in Site 3, while the training set comprised data from the remaining dates and sites (Site 1 on July 4th and 19th, Site 2 on July 4th and 12th, and Site 3 on July 12th and 19th). To enhance the performance of the classifiers stratified random sampling was used to construct a balanced training set (Zhu et al. 2016). A balanced training set has an equal number of examples representing each outcome classification. In this work, stratified random sampling was used to oversample the minority class and undersample the majority classes to equalize the sample sizes across all classes (Ma et al., 2015). For training both RF and SVM classifiers, stratified random sampling was used to create training sets that were then 57 used to train both the RF and SVM classifier. This approach ensured that each classifier was trained using the same set of training examples, allowing for a controlled comparison of their performance. The training set consisted of 150 samples in total: 50 in the 'High' category, 50 in the 'Moderate' category, and 50 in the ‘None’ category. Conversely, the validation set comprised 31 samples, with 17 in the 'High' category, 13 in the 'Moderate' category, and one representing the ‘None’ category. Model Creation Classifiers based on Support Vector Machines (SVM) and Random Forests (RF) were developed to predict the relative abundance of spotted knapweed within a metapixel, utilizing the features specified in Table 4 calculated for each of the spectral band or vegetation index GPs. An SVM and RF classifier was developed using training data from each of the 5 levels in the data GPs. These classifiers are designated as GP0 through GP4, correlating with the pyramid levels from zero to four. Another pair of SVM and RF classifiers, hereafter indicated as ‘GPs’, was constructed using a feature set that concatenates the features across all the levels in the GPs. Classifier construction began with feature selection to identify an optimal feature subset for classification. Recursive feature elimination driven by an RF classifier with 500 trees was used to identify the optimal feature set. Because of stochasticity in the RF training process, optimal feature selection was performed using a two-step ensemble method. The first step used recursive feature elimination to identify the optimal number of features (Nopt). This process was repeated 20 times, and the average cross-validation score was tabulated for each possible number of features from 1 to N, where N is the entire feature set. The optimal number of features was identified as the number of features that generated the highest average cross-validation score. The second step used recursive feature elimination to identify the optimal feature set of size Nopt. This step was also repeated 20 times. These 20 feature sets were analyzed to find the most frequently selected features. The Nopt features that were selected most frequently in these 20 iterations were declared to be the optimal feature set. 58 Once the optimal feature subset was identified, these features were used along with a grid search to identify the optimal hyperparameter settings for the training process. The hyperparameters that were tuned, and the range of values considered by the grid search during this process are listed in Tables 8 and 9 for RF and SVM classifiers, respectively. Finally, the model was trained using the optimal feature set and hyperparameters. This training was conducted via 10-fold cross-validation, and the best-performing model was retained as the final trained classifier. Each classifier was then tested using the independent validation data set consisting of data collected from site 3 on July 4. This approach ensures that the classifier's ability to generalize is accurately assessed using data that was not involved in the tuning or training processes. The analysis of the classifiers' performance is based on the performance metrics shown in Table 6 (Murphy, 2012). Table 6: Classifier performance metrics used to evaluate classifiers performance. Classifier performance metrics Accuracy Precision Recall F1-score Macro Average Weighted Average Support A flow chart of the classifier training and testing process is illustrated in Figure 11. 59 Figure 11: A procedure of classifying Spotted Knapweed using RF and SVM classifiers. Results Feature Selection Model development began with identifying optimal features using the RFECV method, facilitated by a RF classifier composed of 500 trees. Figures 12 to 17 show the average cross-validation score versus number of features for each of the 6 feature sets (GP0, GP1, GP2, GP3, GP4, and GPs) the results of the RFECV process. In these plots, each point indicates the average cross-validation score over the 20 iterations of the RFECV process. These data are used to select the optimal number of features in each feature set by identifying the feature set size that optimizes the average cross-validation score. Figure 12 illustrates the RFECV results for the GP0 data. The curve indicates the relationship between the number of features and the model's mean CV score. Starting from a low number of features, there is a notable rise in the CV score, which rapidly 60 increases and then levels off as more features are added. The peak of this curve is highlighted at the point '21, 0.534', where the model achieves the best performance with 21 features. Beyond this peak, the CV score tends to plateau, showing a consistent trend with minor fluctuations around this peak value. This plateau suggests that adding more than 21 features does not significantly improve the model's predictive capability and that the model has reached a balance between feature complexity and performance. The shape of the curve is characteristic of an initial gain in performance with additional features, followed by stabilization, indicating that further feature additions are not contributing to the predictive strength of the model. RFECV GP0 0.55 21, 0.534 0.5 0.45 0.4 0.35 0.3 0 10 20 30 40 50 60 70 80 90 Figure 12: RFECV Results for GP0, showing the optimal selection of 21 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. Figure 13 illustrates the RFECV results for the GP1 data. The curve begins with a sharp increase, where the mean CV score swiftly rises from the lower number of features, reaching a prominent peak at '16, 0.567'. This suggests that at 16 features, the model 61 attains its highest level of predictive accuracy. Beyond this point, the performance sharply declines, demonstrating that additional features detract from the model's effectiveness. Following this decline, the curve levels off, signifying that further inclusion of features fails to significantly enhance the mean CV score. This leveling persists throughout the rest of the plot, marked by minor fluctuations but lacking any substantial upward movement. Such a pattern suggests that expanding the feature set beyond the optimal count of 16 leads to diminished performance. The curve's trajectory is indicative of a typical phenomenon in feature selection, whereby the marginal gain of adding extra features eventually plateaus or even reverses, reflecting the trade-off between model complexity and generalization. RFECV GP1 0.6 16, 0.567 0.55 0.5 0.45 0.4 0.35 0.3 0 10 20 30 40 50 60 70 80 90 Figure 13: RFECV Results for GP1, showing the optimal selection of 16 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. Figure 14 illustrates the RFECV results for the GP2 data. The curve starts with a steep ascent in the mean CV score, reaching an early peak at the point' 4, 0.582'. This indicates that the model's predictive accuracy is maximized with just 4 features. Following 62 this initial peak, the mean CV score decreases before plateauing, suggesting that additional features beyond the optimal four serve to decrease the model performance. The plateau is characterized by a consistent average score with minor fluctuations around the peak value, reinforcing the notion that a small, concise set of features is sufficient for the model to achieve its best performance. The profile of the curve demonstrates the principle of parsimony in model building, where simpler models with fewer features may yield the best generalization. It also illustrates the concept of diminishing returns in feature inclusion: beyond the optimal number, additional features do not contribute to improved model accuracy and may instead introduce noise or unnecessary complexity. RFECV GP2 0.6 4, 0.582 0.55 0.5 0.45 0.4 0.35 0.3 0 10 20 30 40 50 60 70 80 90 Figure 14: RFECV Results for GP2, showing the optimal selection of 4 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. Figure 15 illustrates the RFECV curve for the GP3 data. The mean CV score rises rapidly as more features are introduced into the model, indicating that each new feature contributes significantly to improving the model's predictive accuracy at this stage. The curve shows that the model is initially gaining valuable information from the incremental 63 addition of features, which is reflected in the increasing CV scores. This suggests that the features added early in the sequence are highly relevant and provide new, useful information that the model can leverage to improve its predictions. As we approach the peak at '16, 0.56', the rate of increase in the mean CV score begins to slow down, indicating that we are nearing the optimal number of features. Each additional feature contributes less to the model's performance, which is typical in feature selection processes where early additions have more impact, and the benefit of adding more features diminishes as the number of features grows. The peak itself represents the point of balance where the model has incorporated just enough features to maximize its performance without yet overfitting or incorporating redundant information. Beyond this peak, as we continue to add features, the performance will not improve further and even start to decline, as seen in the rest of the plot. RFECV GP3 0.6 16, 0.56 0.55 0.5 0.45 0.4 0.35 0.3 0 10 20 30 40 50 60 70 80 90 Figure 15: RFECV Results for GP3, showing the optimal selection of 16 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. Figure 16 illustrates the RFECV results for the GP4 data. In this figure, the mean cross-validation score ascends gradually, reflecting a steady improvement in model performance with each additional feature. This growth suggests that the early features added are each contributing meaningful information that the model is able to use to 64 enhance its accuracy. Beyond the peak at '8, 0.524333333', the curve levels off, indicating that the inclusion of more features does not lead to further significant gains in mean CV score. This plateau suggests that the model has captured the most informative features, and additional features do not contribute new information that could improve the model’s performance. The relatively flat line continuing to the right of the peak implies that the additional features might be redundant or irrelevant, as they do not enhance the model's predictive power. RFECV GP4 0.6 0.55 8, 0.524333333 0.5 0.45 0.4 0.35 0.3 0 10 20 30 40 50 60 70 80 90 Figure 16: RFECV Results for GP4, showing the optimal selection of 8 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. In Figure 17, which visualizes the RFECV results for a combined feature set from all GP levels, the trajectory of the CV scores forms a distinctive shape. Initially, there's a rapid ascent in the mean CV scores as the number of features increases, reaching an apex at the point marked '5, 0.6304'. This peak signifies the most efficient number of features for the model, indicating that beyond this count, the additional features may not contribute significantly to the predictive power and might even introduce redundancy. After this 65 peak, the mean CV scores exhibit a gradual decline, suggesting a plateau effect followed by a slight downward trend as the number of features continues to grow. This pattern implies that the Random Forest algorithm, while generally robust to multicollinearity due to its feature bagging approach, is not completely resistant to the diminishing returns or potential adverse effects of including too many similar or non-informative features. The overall shape of the curve reinforces the principle that beyond a certain point, adding more features can lead to overfitting, where the model becomes overly complex and less generalizable to new data. RFECV GPs 0.65 5, 0.6304 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0 10 20 30 40 50 60 70 80 90 Figure 17: RFECV Results for GPs, showing the optimal selection of 5 features. The x-axis represents the number of features retained, and the y-axis depicts the average of the mean cross-validation scores, as calculated over 20 iterations. Once the size of the optimal feature set was identified for each data set, another iterative process was conducted to select the features that constituted the optimal feature set for each data set. This process involved iteratively performing recursive feature elimination with an RF model, consisting of 500 trees, to identify a data set of size n opt, where nopt is the size of the feature set identified in Figures 12 through 17. This feature 66 selection step was repeated 20 times, and the features selected in each iteration were counted. The nopt features which were selected the most often during this process were declared to be the optimal feature set for the input data. During this feature selection process, 40 out of 84 features were included in the optimal feature set for at least one of the GP levels considered. In Table 7 the color-coding helps to quickly visualize which features are most frequently selected across different GP levels, indicating their importance and relevance in the classification task. Features that appear in more columns are likely to be more robust for the model. The 'nir_mean' and 'rededge_mean' features stand out as they are consistently chosen across all GP levels, suggesting that they are significant predictors regardless of the GP level or when using a combination of all levels. Table 7: Optimized Features for GP0, GP1, GP2, GP3, GP4 and concatenated GPs. 67 Hyperparameter Optimization A grid search was used to identify the optimal hyperparameters for each classifier developed in this work. The hyperparameters tuned and the range of values included in the grid search are listed in Table 8 and 9 for the RF and SVM models, respectively. Table 8: Range of hyperparameter values considered for tuning the Random Forest Classifier using Grid Search. bootstrap TRUE, False max_depth 20, 40, 80, 100 max_features sqrt min_samples_leaf Integers (1,10) min_samples_split Integers (2,11) n_estimators 400, 500, 600, 800, 1000, 1200,1500 Table 9: Range of hyperparameter values considered for tuning SVM using Grid Search. C 0.1, 1, 10, 100 Degree 2, 3, 4, 5 Gamma Scale, Auto, 0.1, 1, 10 Kernel linear, Rbf, Poly, Sigmoid Table 10 lists the optimized hyperparameters identified for the RF models developed using data from each of the 5 levels of the GP (GP0 to GP4) and the composite of all levels (GPs). The table is organized with hyperparameters listed in rows and the GP levels, including the combined GPs, in columns. The 'bootstrap' hyperparameter indicates whether bootstrap sampling is used when building trees; it is set to FALSE for GP0, GP1, GP2, and the combined GPs; thus, the entire training set is utilized for tree construction. Conversely, for GP3 and GP4, this parameter is TRUE, signifying the use of bootstrap 68 samples. The 'max_depth' hyperparameter, which restricts tree depth to prevent overfitting, was set at 20 for all levels. The 'Max_features' hyperparameter determines the number of features considered at each split. The square-root of the number of features was constantly selected across all levels. Variability is observed in 'min_samples_leaf' and the ‘min_samples_split’ hyperparameters. The ‘min_samples_leaf’ parameter sets the minimum samples at a leaf node, while 'min_samples_split' parameter specifies the threshold for splitting nodes. The 'n_estimators' hyperparameter, reflecting the number of trees in the forest, also takes values for the data aggregated at different levels of the GPs. This suggests that the pattern complexity may change at different levels of spatial aggregation. Because the GPs data set has features at different levels of spatial aggregation, it is expected that these data require a larger number of trees for optimal performance. Post-tuning, an elevation in cross-validation scores signifies that the model's accuracy has been enhanced, demonstrating the importance of hyperparameter tuning for building effective RF classifiers. Table 10: Result of RF hyperparameters tuning for GP0 to GP4 and GPs. Hyperparameters GP0 GP1 GP2 GP3 GP4 GPs bootstrap FALSE FALSE FALSE TRUE TRUE FALSE max_depth 20 20 20 20 20 20 max_features sqrt Sqrt sqrt sqrt sqrt sqrt min_samples_leaf 1 1 1 2 3 1 min_samples_split 4 4 15 2 11 5 n_estimators 500 400 500 600 400 800 69 Table 11 lists the optimized values of the SVM hyperparameters for the data at different GP levels and the concatenated dataset encompassing all levels (GPs). A notable variation in the 'C' parameter across the GP levels indicates that unique regularization strengths are optimal at each level, tailored during the hyperparameter tuning phase to enhance model performance. The 'Gamma' parameter, set to 'Scale' for GP0, GP4, and the combined GPs, is adjusted automatically relative to the dataset's feature count, suggesting an adaptive approach to feature influence. In contrast, fixed values are assigned to 'Gamma' for GP1, GP2, and GP3, denoting a precise calibration for these individual levels. The 'Kernel' choice remains consistent with 'RBF' for GP0 through GP4, indicating a preference for this kernel's ability to handle non-linear relationships. Conversely, a 'Linear' kernel is chosen for the combined GPs, suggesting an underlying linear separability when the GP levels are merged. Table 11: Result of SVM hyperparameters tuning for GP0 to GP4 and GPs. Hyperparameters GP0 GP1 GP2 GP3 GP4 GPs C 100 1 100 10 1 10 Degree 2 2 2 2 2 2 Gamma Scale 0.1 0.1 0.1 Scale Scale Kernel RBF RBF RBF RBF RBF Linear Classifier Performance Classification Based on the GP0 Feature Set Figures 18 and 19 illustrate the respective performance of the GP0 RF and SVM classifiers on the validation dataset. Each cell of the confusion matrix shows the proportion of the total number of predictions that fall into the corresponding category. For instance, in Figure 18, the cell in the first row and first column indicates that 82% of the 'High' class was correctly predicted. The middle cell in the first row of this figure shows that 6% of the 'High' class instances were incorrectly predicted as 'None', and the cell in the first row and third column shows that 12% of the 'High' class instances were incorrectly predicted as 'Moderate'. The color gradient, ranging from light to dark green, reflects the magnitude of the proportions, with darker shades representing higher True Label proportions. Predicted Label High Moderate High 14 1 Moderate 5 2 None 0 0 None 2 6 1 True Label Figure 18: RF confusion matrix for the GP0 validation set. Predicted Label High Moderate High 10 5 Moderate 5 1 None 1 0 None 2 7 0 Figure 19: SVM confusion matrix for GP0 validation set. Table 12 reports the performance of both the RF and SVM GP0 classifiers on the validation data set from July 4. 71 Table 12: RF and SVM classification results for the GP0 feature set. Precision Precision Recall Recall F1- F1- RF SVM RF SVM score score RF SVM Support Support SVM RF High 0.82 0.62 0.74 0.59 0.78 0.61 19 17 None 0 0 0 0 0 0 3 1 Moderate 0.46 0.78 0.67 0.54 0.55 0.64 9 13 Macro Avg 0.43 0.47 0.47 0.38 0.44 0.41 31 31 0.64 0.67 0.65 0.55 0.64 0.6 31 31 Weighted Avg Accuracy Value Support RF 0.65 31 SVM 0.55 31 From this table, it can be seen that: ▪ High Class: For the High class, the RF classifier outperforms the SVM in both precision (0.82 vs. 0.62) and recall (0.74 vs. 0.59), indicating that RF is more accurate and also more comprehensive in identifying true High-class cases. The F1-score, which balances precision and recall, is correspondingly higher for RF (0.78 vs. 0.61), affirming its superior performance in this class. This suggests that RF is more adept at handling instances in this category, which could be attributed to its ensemble approach, potentially capturing more complex patterns within the High-class features than the SVM. 72 ▪ Moderate Class: In the Moderate class, SVM shows a higher precision (0.78 vs. 0.46) but lower recall (0.54 vs. 0.67) compared to RF. While SVM is better at correctly labeling Moderate-class instances when it does predict them, it fails to identify a significant proportion of actual Moderate-class cases, as evidenced by the lower recall. RF, although less precise, is more reliable in identifying the presence of the Moderate class, but it also misclassifies more non-Moderate instances as Moderate. The F1-scores are quite close (0.55 for RF and 0.64 for SVM), suggesting a trade-off between precision and recall for the two models. ▪ None Class: Both classifiers fail to identify any instances of the None class, with all scores at 0. This indicates a significant challenge for both models in detecting this class, which could be due to an extremely small number of instances (Support for RF is 3 and SVM is 1). The lack of learning material for this class makes it difficult for both classifiers to establish a pattern, rendering them ineffective for the None-class. When comparing the overall performance, RF demonstrates a higher accuracy (0.65) than SVM (0.55). This suggests that across all classes, RF maintains a better balance between precision and recall, leading to more accurate classification results. The Macro Avg and Weighted Avg scores also support this, with RF showing a slight edge in performance over SVM. These metrics suggest that RF is generally more effective in classifying instances across this dataset, especially considering the Weighted Avg, which takes the support of each class into account, reflecting the real-world distribution of classes. In summary, RF tends to have a more balanced performance across the High and Moderate classes, while SVM struggles with recall but has instances where it can be highly precise. Neither model performs well in the None class. This is probably due to an insufficient number of examples of this class in the training data. Overall, RF is more accurate and consistent across classes, making it a more reliable choice for classification in this scenario. 73 Classification Based on the GP1 Feature Set Figures 20 and 21 illustrate the respective performance of the GP1 RF and SVM True Label classifiers on the validation dataset. Predicted Label High Moderate High 11 1 Moderate 4 1 None 0 0 None 5 8 1 True Label Figure 20: RF confusion matrix for the GP1 validation set. Predicted Label High Moderate High 12 2 Moderate 7 2 None 0 1 None 3 4 0 Figure 21: SVM confusion matrix for the GP1 validation set. Table 13 reports the performance of both the RF and SVM classifiers on the GP1 validation data set from July 4. 74 Table 13: RF and SVM classification results for the GP1 feature set. Precision Precision Recall Recall F1- F1- RF SVM RF SVM score score RF SVM Support Support SVM RF High 0.65 0.63 0.73 0.71 0.69 0.67 15 17 None 0 0.2 0 1 0 0.33 2 1 Moderate 0.62 0.57 0.57 0.31 0.59 0.4 14 13 Macro Avg 0.42 0.47 0.43 0.67 0.43 0.47 31 31 0.59 0.59 0.61 0.55 0.6 0.54 31 31 Weighted Avg ▪ Accuracy Value Support RF 0.61 31 SVM 0.55 31 From this table, it can be seen that: High Class: Both classifiers perform comparably in precision for the High class, with RF at 0.65 and SVM at 0.63. However, RF has a slightly better recall of 0.73 compared to SVM's 0.71, indicating that RF is marginally better at capturing the majority of High-class cases. The F1-scores are also similar, with RF at 0.69 and SVM at 0.67, indicating balanced precision and recall for both classifiers in this category. ▪ Moderate Class: In the Moderate class, RF demonstrates a higher precision (0.62 vs. 0.57) and recall (0.57 vs. 0.31) compared to SVM, which is mirrored in the F1score (0.59 for RF vs. 0.4 for SVM). This indicates that RF is more adept at correctly 75 identifying and capturing Moderate-class instances than SVM, which is less reliable in recognizing true Moderate cases. ▪ None Class: The None class presents an interesting contrast. RF fails to identify any None-class instances (precision, recall, and F1-score at 0), whereas SVM, despite its limited precision at 0.2 and perfect recall at 1, manages an F1-score of 0.33. This suggests that while SVM is able to recognize the None-class instances, it tends to misclassify other classes as None, as evidenced by its low precision. Based on these results, it can be seen that RF is more consistent and accurate across the majority of classes and the dataset as a whole. SVM shows some strengths, particularly in the None class, but its higher Macro Average recall does not translate to better overall accuracy or balance between precision and recall, as reflected in the lower overall accuracy and Weighted Average F1-score. RF's higher values in these key metrics make it the more reliable classifier for this particular dataset at GP1. Classification Based on the GP2 Feature Set Figures 22 and 23 illustrate the respective performance of the GP2 RF and SVM True Label classifiers on the validation dataset. Predicted Label High Moderate High 10 1 Moderate 1 2 None 0 1 None 6 10 0 True Label Figure 22: RF confusion matrix for the GP2 validation set. Predicted Label High Moderate High 16 1 Moderate 8 0 None 1 0 None 0 5 0 Figure 23: SVM confusion matrix for the GP2 validation set. 76 Table 14 reports the performance of both the RF and SVM classifiers on the validation data set from July 4. Table 14: RF and SVM classification results for the GP2 feature set. Precision Precision Recall Recall F1- F1- Support Support RF SVM RF SVM score score RF SVM RF SVM High 0.59 0.64 0.91 0.94 0.71 0.76 11 17 None 1 0 0.25 0 0.4 0 4 1 Moderate 0.77 1 0.62 0.38 0.69 0.56 16 13 Macro Avg 0.79 0.55 0.59 0.44 0.6 0.44 31 31 0.73 0.77 0.66 0.68 0.66 0.65 31 31 Weighted Avg ▪ Accuracy Value Support RF 0.68 31 SVM 0.68 31 From this table, it can be seen that: High Class: For the High class, SVM has a slightly higher precision (0.64) than RF (0.59), suggesting that SVM is marginally better at correctly identifying the High class when it predicts an instance as High. In recall, SVM also has an edge (0.94) over RF (0.91), indicating that SVM is better at capturing the true High-class instances within the dataset. The F1-score, which considers both precision and recall, is higher for 77 SVM (0.76) compared to RF (0.71), confirming SVM's better performance for the High class. ▪ Moderate Class: SVM shows perfect precision (1.0) for the Moderate class, which means all instances SVM predicts as Moderate is correct. However, its recall is only 0.38, indicating it misses a large number of true Moderate-class instances. RF has a lower precision (0.77) but a higher recall (0.62), which suggests it captures more of the Moderate-class instances and has some false positives. The F1-score is higher for RF (0.69) than SVM (0.56), indicating a better balance of precision and recall for RF in the Moderate class. ▪ None Class: RF achieves perfect precision (1.0) but has a low recall (0.25), which means it can correctly identify the None class when it predicts it, but it misses many actual instances of the None class. SVM, on the other hand, has zero recall, indicating it failed to identify any true None-class instances, leading to an F1-score of 0. Despite RF's limited recall, its ability to identify some Low-class instances gives it a better F1-score (0.4) compared to SVM (0), which completely fails in this class. The Macro Average precision is significantly higher for RF (0.79) compared to SVM (0.55), suggesting RF is more precise on average across all classes. The Macro Average recall is also better for RF (0.59 vs. 0.44 for SVM), indicating RF is more effective at capturing true positives across the board. The Weighted Average precision is slightly better for SVM (0.77) compared to RF (0.73), while the Weighted Average recall is the same for both classifiers (0.68), leading to very similar F1-scores (RF: 0.66, SVM: 0.65). The overall accuracy for both RF and SVM is the same (0.68), indicating that both classifiers correctly predict the class labels for 68% of the dataset. While RF and SVM have the same overall accuracy at the GP2 level, RF performs better in the Moderate and None classes, particularly in terms of recall. SVM, however, performs slightly better in the High class, with better precision and recall. The Macro Averages favor RF, indicating that it has a better average performance across all classes, 78 but the Weighted Averages are very similar, reflecting the balanced performance of both classifiers when the class distribution is taken into account. Classification Based on the GP3 Feature Set Figures 24 and 25 illustrate the respective performance of the GP3 RF and SVM True Label classifiers on the validation dataset. Predicted Label High Moderate High 12 3 Moderate 6 1 None 0 1 None 2 6 0 True Label Figure 24: RF confusion matrix for the GP3 validation set. Predicted Label High Moderate High 12 2 Moderate 6 1 None 1 0 None 3 6 0 Figure 25: SVM confusion matrix for the GP3 validation set. Table 15 reports the performance of both the RF and SVM classifiers on the validation data set from July 4. 79 Table 15: RF and SVM classification results for the GP3 feature set. Precision Precision Recall Recall F1- F1- Support Support RF SVM RF SVM score score RF SVM RF SVM High 0.71 0.63 0.67 0.71 0.69 0.67 18 17 None 1 0 0.2 0 0.33 0 5 1 Moderate 0.46 0.67 0.75 0.46 0.57 0.55 8 13 Macro Avg 0.72 0.43 0.54 0.39 0.53 0.4 31 31 0.69 0.63 0.61 0.58 0.6 0.59 31 31 Weighted Avg Accuracy Value Support RF 0.61 31 SVM 0.58 31 From this table, it can be seen that: ▪ High Class: RF shows a precision of 0.71, higher than SVM's 0.63. This suggests that when RF predicts an instance as High, it is more likely to be correct. However, SVM has a slightly higher recall (0.71) than RF (0.67), indicating SVM is marginally better at identifying all relevant instances of the High class within the dataset. The F1-scores for both classifiers are nearly identical (RF: 0.69, SVM: 0.67), suggesting that both classifiers have a similar balance between precision and recall for the High class. ▪ Moderate Class: SVM shows higher precision (0.67) compared to RF (0.46), suggesting SVM is more accurate when it labels an instance as Moderate. Conversely, RF demonstrates a higher recall (0.75) over SVM (0.46), which suggests that RF is better 80 at detecting the true Moderate-class instances but also has more false positives. RF has an F1-score of 0.57, slightly higher than SVM's 0.55, indicating a slightly better balance between precision and recall for RF in the Moderate class. ▪ None Class: RF achieves a perfect precision (1.0) but has a very low recall (0.2), indicating it is selective and accurate when predicting an instance as None but failing to identify most actual None instances. SVM does not correctly identify any Noneclass instances, as evidenced by a recall of 0, leading to an F1-score of 0, which indicates a complete miss for the None class by the SVM. Despite its low recall, RF's ability to identify some None-class instances results in an F1-score of 0.33, indicating a better performance than SVM for the None class. The Macro Average precision is substantially higher for RF (0.72) than SVM (0.43), indicating that RF is, on average, more precise across all classes. The Macro Average recall for RF (0.54) is also higher than SVM (0.39), suggesting that RF captures true positives across all classes more effectively. The Weighted Average precision for RF (0.69) is greater than SVM's (0.63), and the Weighted Average recall for RF (0.61) is also higher than SVM's (0.58). This results in a slightly better F1-score for RF (0.6 vs. 0.59 for SVM), again showing a more balanced performance. The overall accuracy is higher for RF (0.61) than SVM (0.58), indicating that RF is more effective across the entire dataset. In summary, RF generally exhibits a better performance across the High and Moderate classes and a significantly better performance for the None class despite its low recall. RF's higher Macro and Weighted Average metrics, as well as its higher overall accuracy, indicate that it is the more reliable classifier for this dataset at the GP3 level. SVM may have its strengths, particularly in precision within the Moderate class, but RF provides a more consistent and effective performance overall. Classification Based on the GP4 Feature Set Figures 26 and 27 illustrate the respective performance of the GP4 RF and SVM classifiers on the validation dataset. 81 True Label Predicted Label High Moderate High 10 2 Moderate 7 1 None 0 1 None 5 5 0 True Label Figure 26: RF confusion matrix for the GP4 validation set. Predicted Label High Moderate High 15 1 Moderate 9 0 None 1 0 None 1 4 0 Figure 27: SVM confusion matrix for the GP4 validation set. Table 16 reports the performance of both the RF and SVM classifiers on the validation data set from July 4. 82 Table 16: RF and SVM classification results for the GP4 feature set. Precision Precision Recall Recall F1- F1- Support Support RF SVM RF SVM score score RF SVM RF SVM High 0.59 0.6 0.59 0.88 0.59 0.71 17 17 None 1 0 0.25 0 0.4 0 4 1 Moderate 0.38 0.8 0.5 0.31 0.43 0.44 10 13 Macro Avg 0.66 0.47 0.45 0.4 0.47 0.39 31 31 0.58 0.66 0.52 0.61 0.51 0.58 31 31 Weighted Avg Accuracy Value Support RF 0.52 31 SVM 0.61 31 From this table, it can be seen that: ▪ High Class: The precision for RF is slightly lower (0.59) than for SVM (0.6), which means SVM is marginally more accurate when it predicts an instance as High. However, SVM significantly outperforms RF in recall (0.88 vs. 0.59), suggesting that SVM is much better at identifying all relevant High-class instances. Consequently, the F1-score for SVM (0.71) is notably higher than for RF (0.59), indicating a better balance between precision and recall for SVM in the High class. ▪ Moderate Class: SVM exhibits a much higher precision (0.8) compared to RF (0.38), which means it is more accurate when it labels an instance as Moderate. RF has a higher recall (0.5) over SVM (0.31), suggesting that RF is better at detecting true Moderate-class instances but also includes more false positives. The F1-scores are 83 similar but slightly higher for SVM (0.44) than for RF (0.43), indicating a slightly better balance for SVM in the Moderate class. ▪ None Class: RF achieves perfect precision (1.0), indicating it correctly identifies all instances it labels as None, but its recall is very low (0.25), which means it misses most actual None instances. SVM does not identify any true None-class instances, reflected by a recall of 0 and, consequently, an F1-score of 0. RF, therefore, performs better for the None class with a modest F1-score of 0.4 due to its ability to identify some true None instances. The Macro Average precision is higher for RF (0.66) compared to SVM (0.47), indicating that RF is more precise across all classes on average. However, the Macro Average recall is slightly higher for SVM (0.4) compared to RF (0.45), suggesting that SVM is better at capturing true positives across all classes. The Weighted Average precision is higher for SVM (0.66) compared to RF (0.58), while the Weighted Average recall is higher for SVM (0.61) compared to RF (0.52), leading to a slightly better F1-score for SVM (0.58 vs. 0.51 for RF). Notably, the overall accuracy is higher for SVM (0.61) than for RF (0.52), indicating that SVM is more effective at correctly predicting class labels across the entire dataset. For GP4, while RF demonstrates higher precision on average across all classes, SVM has a better recall for the High class and a higher overall accuracy. This suggests that SVM is more adept at classifying this dataset, particularly for the most represented High class, which seems to drive its higher overall performance. Despite RF's perfect precision in the None class, its failure to identify the majority of None instances results in its lower overall accuracy. Classification Based on the Concatenated GPs Feature Set Figures 28 and 29 illustrate the respective performance of the GPs RF and SVM classifiers on the validation dataset. 84 True Label Predicted Label High Moderate High 4 5 Moderate 6 0 None 0 0 None 8 7 1 True Label Figure 28: RF confusion matrix for the concatenated GPs validation set. Predicted Label High Moderate High 10 4 Moderate 10 1 None 1 0 None 3 2 0 Figure 29: SVM confusion matrix for the concatenated GPs validation set. Table 17 reports the performance of both the RF and SVM classifiers on the validation data set from July 4. 85 Table 17: RF and SVM classification result for the concatenated GPs feature set. Precision Precision Recall Recall F1- F1- Support Support RF SVM RF SVM score score RF SVM RF SVM High 0.24 0.5 0.41 0.6 0.3 0.54 49 85 None 0 0.08 0 0.4 0 0.13 23 5 Moderate 0.55 0.42 0.43 0.17 0.49 0.24 83 65 Macro Avg 0.26 0.33 0.28 0.39 0.26 0.3 155 155 0.37 0.45 0.36 0.41 0.35 0.4 155 155 Weighted Avg ▪ Accuracy Value Support RF 0.36 155 SVM 0.41 155 From this table, it can be seen that: High Class: RF has low precision (0.24) and moderate recall (0.41) for the High class, resulting in a low F1-score (0.3). This indicates that RF isn't very accurate when it identifies an instance as High and misses a significant number of High instances. SVM outperforms RF in both precision (0.5) and recall (0.6) for the High class, with a substantially higher F1-score (0.54). This suggests that SVM is not only more accurate when it predicts an instance as High but also better at capturing more of the true High instances. ▪ Moderate Class: RF shows decent precision (0.55) and moderate recall (0.43) for the Moderate class, with a corresponding F1-score (0.49). This indicates that RF is 86 relatively accurate and reliable in identifying Moderate-class instances. SVM has lower precision (0.42) and significantly lower recall (0.17) for the Moderate-class compared to RF, resulting in a lower F1-score (0.24). This suggests SVM is less effective at detecting true Moderate instances. ▪ None Class: RF fails to identify any None-class instances, with precision, recall, and F1-score all at 0. This indicates a significant limitation of the RF model in detecting the None class. SVM shows very low precision (0.08) but a higher recall (0.4) for the None class, leading to a low F1-score (0.13). While SVM does manage to identify some true None-class instances, it also has a high rate of false positives. The Macro Average precision is slightly higher for SVM (0.33) than RF (0.26), while the Macro Average recall is higher for SVM (0.39) compared to RF (0.28). This could indicate that SVM has a slight edge in detecting true positives on average across all classes. The Weighted Average precision is higher for SVM (0.45) than for RF (0.37), and the Weighted Average recall is also higher for SVM (0.41) compared to RF (0.36). This suggests that SVM has a better overall performance when considering the distribution of classes in the dataset. The overall accuracy is higher for SVM (0.41) than for RF (0.36), indicating that SVM is more effective at correctly classifying instances across the concatenated GP levels. In conclusion, SVM generally shows better performance than RF across concatenated GP levels, particularly in the High class, which drives its higher overall accuracy. RF has its strengths, performing better in the Moderate class, but fails to identify any instances in the None class. Despite its limitations, SVM provides a more balanced performance across all classes, making it the preferable model in this comparative analysis. Discussion Table 7 reveals significant insights into the feature selection process for image classification tasks. It shows that out of 84 features, less than half (40) are useful for classifying the relative abundance of spotted knapweed, which emphasizes the importance of feature optimization in image classification. The analysis of RF classifier performance as a function of the number of features used for classification, as shown in Figures 12 through 17, indicates that increasing the number of features initially benefits 87 model performance up to a certain point. Beyond this optimal number, however, including additional features tends to diminish the classifier's effectiveness, underscoring that even for models tolerant to multicollinearity, like RF, there's a threshold beyond which feature redundancy becomes counterproductive. Of all the features considered in this study, the mean NIR and mean red-edge reflectance features appear to be the most useful for classifying the relative abundance of spotted knapweed, because they are included in the optimal feature sets for all GP levels and the concatenated GPs set. This selection aligns with our understanding of plant physiology; the chlorophyll absorption and cell structure reflection properties in these spectral ranges are critical for vegetation identification. The consistent selection of these features across different scale-space models underscores their robustness and importance in capturing the spectral signature of vegetation. Surprisingly, mean red reflectance, typically a vital component in traditional vegetation indices like NDVI, is absent from the optimized feature sets. This omission might indicate that the models leverage other features that capture similar information. For example, the reNDVI, which is formulated using NIR and red-edge bands, appears in all but the GP2 feature set, but the combination of these higher-wavelength spectra provides a significant capability for vegetation classification. Green spectral band features, such as the mean and standard deviation of green reflectance and, the mean of the green-band GLCM-based entropy, and the range of the green-band GLCM-based angular second moment (ASM), appear in half of the optimized feature sets. Their presence highlights the role of spatial pattern of green band reflectance in capturing vegetation structure and texture, which may be pertinent to distinguishing spotted knapweed, especially considering its sparse canopy that allows for shadowing effects and visibility of the ground layer. Aside from the NIR, red-edge, and certain green band features, most other features might be specific to the nuances of particular GP levels. This specificity suggests a degree of customization in the feature sets, where each GP level may present unique characteristics that require a tailored approach to feature selection. These findings 88 suggest that for remote sensing tasks like vegetation classification, a focused set of wellchosen features can significantly enhance model performance by capturing essential information without overburdening the model with redundant data and that this feature set may change depending on the spatial resolution of the available imagery. The comparative performance analysis between SVM and RF models on various GP levels highlights a significant observation: GP2 outperforms other levels regarding model accuracy and balanced F1-scores. With an accuracy of 0.68, GP2 features align exceptionally well with the classification objective of discerning the relative abundance of spotted knapweed. Intriguingly, the feature set optimized for GP2 is the most compact among all GP levels, consisting of only four features. These are: • Mean NIR-band reflectance, which is critical for assessing vegetation health as it is strongly absorbed by healthy vegetation. • Mean red-edge band reflectance, which is a sensitive indicator of chlorophyll content and plant stress. • Mean green-band entropy, which is a metric of the variability/homogeneity of the green-band reflectance, which may relate to the structure of the vegetative canopy or the presence of different plant species in close proximity to each other. • Mean NDVI entropy, which is a metric of variability of the NDVI (itself a measure of vegetative vigour) within a metapixel and may relate to sparse canopies through which bare soil can be seen. The four features that comprise the optimal GP2 feature set leverage data from all four of the spectral bands measured by the Sequoia multi-spectral imager (red band data is included in the NDVI). The mean reflectance values in the Red-edge, and NIR bands are particularly effective because these bands are highly responsive to the presence of vegetation. The GLCM-based entropy features (ENT-green and ENT-NDVI) are likely to capture the unique textural patterns associated with spotted knapweed's sparse canopy and/or groups of heterogeneous plants in close proximity. The irregularity in canopy cover 89 can create shadows and allow glimpses of the sub-canopy vegetation or bare soil, contributing to a diverse texture signature. The four features that comprise the GP2 optimal feature set construct a classifier that outperforms classifiers built using data aggregated at different levels of the GPs or multiscale data from all of the GP levels combined. This suggests that the models built with GP2 data are leveraging both the spectral signatures of the vegetation and the textural context provided by the surrounding environment, which includes shadows and subcanopy elements. The optimal feature set at GP2 appears to strike a balance, improving classification by reducing the signal-to-noise ratio through spatial averaging in the spectral data. This improvement to the signal-to-noise ratio would not only emphasizes the spectral signatures but also enhance the classifier's ability to differentiate between spotted knapweed and other vegetation. It also opens up the possibility of refining the feature selection process to hone in on those attributes that most effectively capture the characteristics of the target species, potentially leading to even more streamlined and efficient models. The GP2 features correspond to a GSD of 11.6 cm, which is 4 times larger than the original image resolution of 2.9 cm. This specific GSD appears to strike a balance between detail and abstraction, providing an optimal scale for the classification task at hand. Data at finer resolutions (lower GSD) may contain more noise due to measurement error or overly intricate spatial details that could complicate the classification process, whereas coarser resolutions (higher GSD) may lack the requisite spatial detail for distinguishing between different classes. The results suggest that, within the context of this study, smoothing the data to 11.6 cm spatial resolution increases the signal-to-noise ratio of the input features, allowing the classifiers to produce more accurate results with the fewest number of descriptive features compared to the other feature sets considered in this work. The results obtained from concatenating GP levels, however, have led to an unexpected outcome that contrasts with the findings of Roberti de Siqueira et al. (2013), who showed that a multi-scale representation of image features often produces higher- 90 performing classifiers. The anticipated advantage of a multi-scale feature set that leverages information from various scales did not materialize. Instead, the GPs models exhibited diminished accuracy, with the RF classifier achieving only 0.38 and the SVM classifier faring slightly better at 0.41. Such outcomes suggest that the multi-scale concatenation diluted rather than enriched the discriminative capability inherent to the features at individual levels. This dilution effect implies that the distinct and nuanced patterns captured by each GP level's features, which might be critical for effective classification, lose their impact when combined. This reduction in performance highlights a potential discrepancy between theoretical expectations and empirical realities, suggesting that the synergistic potential of concatenated multi-scale features may not always hold true across different datasets or classification frameworks. It is also possible that recursive feature selection is not able to identify a truly optimal feature set given the volume and multi-collinearity of these features. Recursive feature selection is a greedy feature optimization heuristic that could lead to degraded performance if many features have similar importance metrics. This could be overcome using a global optimization method like a genetic algorithm (Goldberg, 1994). In conclusion, this study underscores the significance of selecting an appropriate scale, especially when integrating data sources like GLCM and GP. The superior performance at the 11.6 cm GSD demonstrates the critical role of resolution in image classification tasks. Mapping Spotted Knapweed at Site 3 To see the impact of selecting the optimal scale-space representation of the RPASacquired imagery for image classification, maps of the relative abundance of spotted knapweed are from spectral data at GP0 and GP2 levels of aggregation collected at Site 3 on July 4. These datasets were withheld from the model-building process, and labeled examples were only utilized to calculate the classifiers' performance metrics (e.g. in Table 12). An aerial photograph of Site 3 taken during the July 4 imaging flight is shown in Figure 30. This figure shows that there is a diagonal pattern of green vegetative growth that 91 slopes from the top left to the bottom right. Dense green vegetation also dominates the top right and bottom left corners of the image., while a strip of reddish-grey vegetation runs through the middle of the image. The dark green vegetation is mostly spotted knapweed interspersed with other vegetation, while the reddish-grey vegetation is a patch of cheatgrass (Bromus tectorum L), another invasive species common in British Columbia’s grasslands. Figure 31 was constructed by applying the GP0 RF classifier to the Site 3 data. This classifier replicates the results presented by Baron and Hill (2020). This figure has a high degree of speckling (i.e. single metapixels with a category that is different from the surrounding metapixels) in the none and moderate categories. The VNIR image in Figure 30 does not reveal much patchiness, so this speckle is likely an artifact of the classifier. The precision of the GP0 for classifying Moderate and None categories of spotted knapweed is 0 and 0.46, so this speckling is likely due to false positive classifications. Furthermore, the recall of the moderate category is 0, which gives no confidence in these classifications. Figure 32 was constructed by applying the GP2 RF classifier to the Site 3 data. Based on the results of this study, this classifier uses the optimal scale-space representation of the image data. This image exhibits much less speckle than Figure 31. Instead, there is more contiguity between pixels in each class forming larger aggregate groupings which follow the diagonal patterning visible in VNIR image (Figure 30). The precision of the moderate and none classes for this classifier is much higher than for the GP0 RF classifier, which leads to more confidence in these classifications, though the recall of the moderate class is still quite low. The map of spotted knapweed made with the GP2 RF classifier (Figure 32) shows smoother transitions between different abundance classes, whereas the transitions between classes in the map made by the GP0 RF classifier are abrupt. This result suggests that the GP2 scale-space representation of the image data better describes the transitions in knapweed distribution. When comparing the predicted area of spotted knapweed abundance, the GP0 RF model (Figure 31) classifies 8,358 square meters as High, 2,926 square meters as 92 Moderate, and 2,636 square meters as None, whereas the GP2 RF model (Figure 32) classifies 20,288 square meters as High, 25,984 square meters as Moderate, and 9,408 square meters as None. The adoption of the GP2 data suggests that when the image data is represented at a scale that incorporates a moderate level of smoothing—enough to reduce noise and enhance the meaningful signal—it results in a more accurate depiction of the knapweed's spatial distribution. This scale-space optimization allows the GP2 RF model to capture essential characteristics of the knapweed's presence, which might be missed at the finer, less-smoothed GP0 level. The more pronounced smoothing inherent in the GP2 data likely helps to suppress irrelevant variations, thereby strengthening the model's ability to detect the true signal of knapweed abundance. This optimized scalespace representation is key to producing a more reliable and comprehensive map of knapweed distribution, which is crucial for effective monitoring and management of this invasive species. 93 Figure 30: True-colour image from flight data collected at field site 3 on July 4, 2018. 94 Figure 31: RF Classification map generated using GLCM-GP0 meta pixel-based image analysis, illustrating the relative abundance of spotted knapweed. 95 Figure 32: RF Classification map generated using GLCM-GP2 meta pixel-based image analysis, illustrating the relative abundance of spotted knapweed. This level demonstrates the highest accuracy. 96 Conclusion This study explored scale-space representations of remote-sensed images to identify optimal features and their relationship with spatial scale, contributing to the understanding of how these representations affect the accuracy of vegetation prediction models. The identification of mean NIR reflectance and mean red-edge reflectance as significant features across all GP levels is a pivotal finding, emphasizing their importance in capturing the distinctive spectral characteristics of vegetation. These features are particularly sensitive to chlorophyll content, a key indicator of plant health, and the cell structure of vegetation, which is crucial for differentiating between various plant species and conditions. That these two features were selected across different scale-space representations of the image data indicates their importance for identifying and quantifying the relative abundance of spotted knapweed. Upon examining the classification models, including both RF and SVM, it is evident that the GP2 level exhibits superior performance in terms of accuracy. Notably, the Ground Sample Distance (GSD) of the imagery from the GP2 level, at 11.6 cm, contrasts with the 2.9 cm GSD of the raw imagery. This difference suggests that the smoothing inherent in the GP2 data, which reduces noise and minor variations that may not be relevant to the classification task, likely contributes to improved performance. The GP2 level, with its moderate smoothing, seems to strike an optimal balance between preserving critical information and reducing extraneous detail that could confuse the model. These results carry significant implications for remote sensing analysis. They challenge the conventional approach that typically relies on the raw sensor resolution to dictate the scale of analysis, suggesting instead the potential benefits of scale-space analysis. By optimizing the scale of feature extraction, it is possible to enhance the accuracy of vegetation classification models. Moreover, our findings refute the common assumption that lower spatial-resolution data inherently yields inferior results, which drives RPAS-based remote sensing surveys to be conducted to achieve the highest spatial resolution data that is practicable to acquire. In fact, the enhanced performance at the 97 GP2 level implies that a compromise in spatial resolution does not necessarily equate to a loss in the quality of analytical outcomes. Consequently, the RPAS data acquisition could have been conducted at a higher elevation above the land surface—around 200 m (rather than 30 m)—resulting in an imagery resolution of 11.6 cm. This adjustment in flight height would have increased the overall size of the scene capture in each RPAS-acquired image, which would have reduced the total number of images that needed to be captured to complete the survey, thereby increasing the speed with which the data was acquired. Such an approach would enable the imaging of larger areas within the same flight time, significantly improving the operational efficiency of the vegetation monitoring efforts. 98 Chapter 4 Conclusion & Future works Conclusion The principal aim of this research was to explore the value of scale-space representations of image features to predict the abundance of spotted knapweed within grassland ecosystems. Specifically, this work sought to determine if there was an optimal spatial resolution for image features or if multi-scale representations of features could improve the prediction of spotted knapweed abundance using spectral and grey-scale colocation matrix (GLCM)-based textural features. The Gaussian Pyramid (GP) method was instrumental in determining the spatial resolution that encapsulates the critical data for robust predictive modeling. Notably, this work shows that image features derived from the second level of the gaussian pyramid (GP2 level) produced classifiers that outperformed classifiers trained using the base-resolution image data or image data aggregated to other spatial scales. At the GP2 level, features are derived from image data with 4-times lower spatial resolution than the original data. Interestingly, this research showed that classifiers trained using GP2 data produced better results than classifiers trained using multi-resolution data from all levels in the GP. This latter result contradicts the work of Roberti de Siqueira et al. (2013), who showed that a multi-scale representation of image features often produces higher-performing classifiers. I suspect that this result may be due to the relatively few training examples (N=150) compared to the number of features across the entire scale space (N=420). Contrary to the typical remote-sensing approach that equates higher spatial resolution with improved analytical performance, our study proposes that an intermediate resolution, as represented by GP2, may provide a more accurate reflection of ground realities by filtering out noise and minor variations that do not contribute to the species' identification. The implications for grassland management are profound. Accurately predicting the abundance of spotted knapweed, a virulent invasive species, enables resource managers to deploy control treatments more judiciously, concentrating on areas heavily invaded and conserving efforts in regions with lower incidence. This study illustrates scalespace analysis's potential to increase invasive species mapping accuracy and bolster the efficacy of ecological management tactics. More accurate invasive species mapping significantly contributes to grassland management in the following areas: ▪ Early Detection and Rapid Response: The model's precision in identifying spotted knapweed at is pivotal for detecting invasions early, crucial for rapid response actions to halt the spread before establishment. ▪ Precision Management: Precise spatial estimates of spotted knapweed abundance allow for targeted management. This enables more efficient allocation of resources like herbicides, labor for manual removal, and biocontrol agents, focusing on priority areas and reducing non-target impacts. ▪ Monitoring Treatment Efficacy: Post-treatment monitoring is vital for evaluating management success. Accurate maps of spotted knapweed abundance derived from pre- and post-intervention remote sensing imagery can be compared to compute intervention success metrics. ▪ Adaptive Management: Invasive species management is dynamic, requiring adaptation to change. The invasive species mapping method developed in this work supports continuous monitoring, yielding data to inform adaptive management decisions and strategy adjustments. ▪ Habitat Restoration Planning: By delineating spotted knapweed's spatial distribution, the model supports restoration planning, identifying reintroduction sites for native species and guiding restoration efforts to restore native plant community structure and ecosystem functions. ▪ Cost-Effective Surveying: The case study explored in Chapter 3 revealed that the optimal image resolution for the mapping of spotted knapweed abundance was fourtimes larger than the resolution at which the imagery was acquired. This result suggests that the images could have been acquired at a much higher flight level. Flying a remote sensing platform (e.g., a remotely piloted aircraft system, RPAS) at a higher flight level will enable the acquisition of images with a larger field of view, reducing the time required to image a fixed spatial extent. Thus, by tuning the flight level to the optimal spatial resolution for image analysis, larger areas can be surveyed more 101 economically, enabling more frequent surveys and better-informed management decisions. In conclusion, the application of this model within grassland management programs can significantly enhance the effectiveness of spotted knapweed control and mitigation efforts. Its integration into routine monitoring practices represents a proactive step towards sustainable management of grassland ecosystems. While the model devised in this study is tailored to a specific species and may not directly translate to other species, the underlying modeling framework I have established has the potential for broader application. To adapt this approach for a different species, it would be essential to fine-tune the features and hyperparameters to align with the unique spectral characteristics and spatial distribution patterns of the new species. Additionally, this research indicates that determining the optimal scale for analysis by identifying the most effective GP level is crucial. With these customizations, the updated model could then be utilized to map the presence of the new target species across extended regions. Future work should focus on refining the categories used to define the relative abundance of spotted knapweed through additional data collection and stratified spatial sampling methods, evaluating the generalizability of the models developed using this method over larger spatial extents, and extending its application to other invasive species and habitat types. Additionally, the development of user-friendly tools and interfaces for land managers to utilize this model can facilitate its widespread adoption in conservation and land management sectors. Future Work - Expanding Methodologies As we progress, multiple prospects for continued research and enhancement in this field emerge: Generative Adversarial Networks for Training Data Augmentation The exploration of generative adversarial networks (GANs) to supplement the training dataset constitutes a significant direction for future research. The necessity for a balanced training set in this study was imperative to avoid category bias. To achieve a balanced 102 training set, however, the data available for training we down sampled selecting only as many examples of each category as were available in the minority class. This resulted in a small training set size of 150 examples. With such a small number of training examples, the machine learning classifiers used in this study (i.e. random forest and support vector machine) would not be able to utilize the information available from a large set of input features, such as the 84 features used in the single GP-level models. A GAN is a framework for generative artificial intelligence, in which two artificial neural networks (ANNs) compete against each other. One ANN, the generator, seeks to create synthetic examples that match a pattern in input data, while the other Ann, the discriminator, seeks to discriminate between examples synthesized by the generator and those existing in the input set. The introduction of GANs could address the limitation of a small training set size by generating synthetic yet plausible training examples that mimic the patterns in the true training examples. Employing a GAN to oversample the true training data could enhance the training process without the associated risk of overfitting that sample duplication entails. This technique allows for the expansion of the training set size, potentially increasing it to 300 samples, evenly distributed across classifications. Such expansion could bolster the model's generalization and fortify the predictions of spotted knapweed prevalence. Feature Optimization through Data Compression Data compression methods like Principal Component Analysis (PCA) and autoencoders represent an avenue for refining feature selection in future research endeavors. Although recursive feature elimination with cross-validation (RFECV) has demonstrated efficacy in identifying an optimal feature set in this research, PCA and autoencoders present alternative means for isolating pivotal features within a dataset. PCA can decrease the dimensionality of data while retaining significant variance, which may reveal the data's intrinsic structure. Autoencoders, which are specialized ANNs, use a deep network to encode input data efficiently in a central encoding layer and subsequently reconstruct output from this encoded information. Training autoencoders 103 can lead to the extraction of a feature set that encapsulates the essential details necessary for precise predictions. These techniques are especially beneficial in high-dimensional data scenarios, aiming to distill the feature set without forfeiting pertinent information critical to the classification task. Compressing the feature space could aid in distilling the relevant multiscale features present throughout the scale-space resulting in a more streamlined set of input features for evaluation with RFECV. Not only will this reduce the computational demand of classifier training, but it could also result in the learning of more accurate classifiers. Such advancements could markedly improve the mapping and monitoring processes for invasive species like spotted knapweed in grassland ecosystems. The assimilation of these sophisticated methodologies is anticipated to expand the toolkit available for remote sensing analyses and species distribution modeling. Future studies can build upon the groundwork established by this thesis to further enhance our proficiency in ecological conservation and management, safeguarding the fragile equilibriums in our natural habitats. Other suggestions Deep learning approaches, such as convolutional neural networks (CNNs), and their performance analysis across various spatial resolutions could provide additional insights and might even lead to the automation of identifying the optimal spatial scale for feature extraction. Exploring scale-space representations of other image datasets, like hyperspectral imagery, could enrich the dataset and potentially improve classification models. With continuous advancements in sensor technology, assessing their performance across different scales will be essential, including the impact of new sensors on scale-space representation and feature selection. Future research should also consider the operational implications, such as the effects of altering flight altitude on the efficiency and cost-effectiveness of RPAS-based imaging surveys. The concepts derived from this study might also have more extensive applications in spatial analysis, extending to urban planning, disaster management, and geological 104 surveys. By delving into these areas of future work, the field can evolve toward more sophisticated, precise, and efficient environmental monitoring and analysis methods. The ultimate aim is to establish a set of best practices for scale-space representation in remote sensing that can be tailored to diverse situations and applications. 105 REFERENCES 106 Adelson, E.H., Anderson, C.H., Bergen, J.R., Burt, P.J., Ogden, J.M., 1984. Pyramid methods in image processing. RCA engineer 29, 33–41. Adi, S., Pristyanto, Y., Sunyoto, A., 2019. The best features selection method and relevance variable for web phishing classification. Presented at the 2019 International Conference on Information and Communications Technology (ICOIACT), IEEE, pp. 578–583. Andrew, M.E., Ustin, S.L., 2009. Habitat suitability modelling of an invasive plant with advanced remote sensing data. Diversity and Distributions 15, 627–640. https://doi.org/10.1111/j.1472-4642.2009.00568.x Ataei, M., Osanloo, M., 2004. Using a combination of genetic algorithm and the grid search method to determine optimum cutoff grades of multiple metal deposits. International Journal of Surface Mining, Reclamation and Environment 18, 60–78. Baraldi, A., Panniggiani, F., 1995. An investigation of the textural characteristics associated with gray level cooccurrence matrix statistical parameters. IEEE Transactions on Geoscience and Remote Sensing 33, 293–304. https://doi.org/10.1109/TGRS.1995.8746010 Baron, J., Hill, D.J., 2020. Monitoring grassland invasion by spotted knapweed (Centaurea maculosa) with RPAS-acquired multispectral imagery. Remote Sensing of Environment 249, 112008. https://doi.org/10.1016/j.rse.2020.112008 Baron, J., Hill, D.J., Elmiligi, H., 2018. Combining image processing and machine learning to identify invasive plants in high-resolution images. International Journal of Remote Sensing 39, 5099–5118. https://doi.org/10.1080/01431161.2017.1420940 Baron, J.P.J., 2020. Mapping invasive plants using RPAS and remote sensing. Belgiu, M., Drăguţ, L., 2016. Random forest in remote sensing: A review of applications and future directions. ISPRS journal of photogrammetry and remote sensing 114, 24–31. Bradley, B.A., 2014. Remote detection of invasive plants: a review of spectral, textural and phenological approaches. Biol Invasions 16, 1411–1425. https://doi.org/10.1007/s10530-013-0578-9 Bradski, G., 2000. The openCV library. Dr. Dobb’s Journal: Software Tools for the Professional Programmer 25, 120–123. Breiman, L., 2001. Random Forests. Machine Learning 45, 5–32. https://doi.org/10.1023/A:1010933404324 Carlson, T.N., Ripley, D.A., 1997. On the relation between NDVI, fractional vegetation cover, and leaf area index. Remote Sensing of Environment 62, 241–252. https://doi.org/10.1016/S0034-4257(97)00104-1 Chang, W., Liu, Y., Xiao, Y., Yuan, X., Xu, X., Zhang, S., Zhou, S., 2019. A machine-learningbased prediction method for hypertension outcomes based on medical data. Diagnostics 9, 178. Chaudhuri, P., Marron, J.S., 2000. Scale space view of curve estimation. The Annals of Statistics 28, 408–428. https://doi.org/10.1214/aos/1016218224 107 Cherkassky, V., Ma, Y., 2004. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks 17, 113–126. https://doi.org/10.1016/S0893-6080(03)00169-2 Chomat, O., Colin de Verdière, V., Crowley, J.L., 2001. Recognizing goldfish? or Local scale selection for recognition techniques. Robotics and Autonomous Systems, Seventh Symposium on Intelligent Robotic Systems - SIRS’99 35, 191–200. https://doi.org/10.1016/S0921-8890(01)00124-5 Coelho, L.P., 2013. Mahotas: Open source software for scriptable computer vision 1, e3. https://doi.org/10.5334/jors.ac Conners, R.W., Harlow, C.A., 1980. A Theoretical Comparison of Texture Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-2, 204–222. https://doi.org/10.1109/TPAMI.1980.4767008 Cortes, C., Vapnik, V., 1995. Support-vector networks. Machine learning 20, 273–297. Deriche, R., 1993. Recursively implementating the Gaussian and its derivatives (PhD Thesis). INRIA. Diagne, C., Leroy, B., Vaissière, A.-C., Gozlan, R.E., Roiz, D., Jarić, I., Salles, J.-M., Bradshaw, C.J.A., Courchamp, F., 2021. High and rising economic costs of biological invasions worldwide. Nature 592, 571–576. https://doi.org/10.1038/s41586-021-03405-6 Dorigo, W., Lucieer, A., Podobnikar, T., Čarni, A., 2012. Mapping invasive Fallopia japonica by combined spectral, spatial, and temporal analysis of digital orthophotos. International Journal of Applied Earth Observation and Geoinformation 19, 185– 195. https://doi.org/10.1016/j.jag.2012.05.004 Duncan, P., Podest, E., Esler, K.J., Geerts, S., Lyons, C., 2023. Mapping Invasive Herbaceous Plant Species with Sentinel-2 Satellite Imagery: Echium plantagineum in a Mediterranean Shrubland as a Case Study. Geomatics 3, 328–344. https://doi.org/10.3390/geomatics3020018 Dvořák, P., Müllerová, J., Bartaloš, T., Brůna, J., 2015. Unmanned aerial vehicles for alien plant species detection and monitoring. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 40, 83–90. Efron, B., Tibshirani, R.J., 1994. An introduction to the bootstrap. CRC press. Elith, J., 2019. 15-Machine learning, random forests, and boosted regression trees. Quantitative analyses in wildlife science 281. Erickson, B.J., Korfiatis, P., Akkus, Z., Kline, T.L., 2017. Machine learning for medical imaging. Radiographics 37, 505–515. Florack, L.M.J., 1997. Image Structure (Computational Imaging and Vision #10). KluwerAcademicPublishers, Dordrecht,TheNetherlands. Florack, L.M.J., ter Haar Romeny, B.M., Koenderink, J.J., Viergever, M.A., 1992. Scale and the differential structure of images. Image and Vision Computing 10, 376–388. https://doi.org/10.1016/0262-8856(92)90024-W Foody, G.M., Mathur, A., 2004. A relative evaluation of multiclass image classification by support vector machines. IEEE Transactions on Geoscience and Remote Sensing 42, 1335–1343. https://doi.org/10.1109/TGRS.2004.827257 Foster, J.G., Ploughe, L.W., Akin-Fajiye, M., Singh, J.P., Bottos, E., Van Hamme, J., Fraser, L.H., 2020. Exploring trophic effects of spotted knapweed (Centaurea stoebe L.) on 108 arthropod diversity using DNA metabarcoding. Food Webs 24, e00157. https://doi.org/10.1016/j.fooweb.2020.e00157 Fushiki, T., 2011. Estimation of prediction error by using K-fold cross-validation. Stat Comput 21, 137–146. https://doi.org/10.1007/s11222-009-9153-8 Gaskin, J.F., Espeland, E., Johnson, C.D., Larson, D.L., Mangold, J.M., McGee, R.A., Milner, C., Paudel, S., Pearson, D.E., Perkins, L.B., Prosser, C.W., Runyon, J.B., Sing, S.E., Sylvain, Z.A., Symstad, A.J., Tekiela, D.R., 2021. Managing invasive plants on Great Plains grasslands: A discussion of current challenges. Rangeland Ecology & Management, Great Plains 78, 235–249. https://doi.org/10.1016/j.rama.2020.04.003 Gholizadeh, H., Friedman, M.S., McMillan, N.A., Hammond, W.M., Hassani, K., Sams, A.V., Charles, M.D., Garrett, D.R., Joshi, O., Hamilton, R.G., Fuhlendorf, S.D., Trowbridge, A.M., Adams, H.D., 2022. Mapping invasive alien species in grassland ecosystems using airborne imaging spectroscopy and remotely observable vegetation functional traits. Remote Sensing of Environment 271, 112887. https://doi.org/10.1016/j.rse.2022.112887 Goldberg, D.E., 1994. Genetic and evolutionary algorithms come of age. Communications of the ACM 37, 113–120. Gonzalez, R.C., Woods, R.E., Hall, P.P., 2008. Digital Image Processing Third Edition Pearson International Edition Prepared by Pearson Education. Journal of Biomedical Optics 14, 029901. Guan, H., Li, J., Chapman, M., Deng, F., Ji, Z., Yang, X., 2013. Integration of orthoimagery and lidar data for object-based urban thematic mapping using random forests. International Journal of Remote Sensing 34, 5166–5186. Haddad, R.A., Akansu, A.N., 1991. A class of fast Gaussian binomial filters for speech and image processing. IEEE Transactions on Signal Processing 39, 723–727. Hall-Beyer, M., 2017. Practical guidelines for choosing GLCM textures to use in landscape classification tasks over a range of moderate spatial scales. International Journal of Remote Sensing 38, 1312–1338. Haralick, R.M., Shanmugam, K., 1973. Computer Classification of Reservoir Sandstones. IEEE Transactions on Geoscience Electronics 11, 171–177. https://doi.org/10.1109/TGE.1973.294312 Haralick, R.M., Shanmugam, K., Dinstein, I., 1973. Textural Features for Image Classification. IEEE Transactions on Systems, Man, and Cybernetics SMC-3, 610– 621. https://doi.org/10.1109/TSMC.1973.4309314 Harris, C.R., Millman, K.J., Walt, S.J. van der, Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M., Hoyer, S., Kerkwijk, M.H. van, Brett, M., Haldane, A., Río, J.F. del, Wiebe, M., Peterson, P., GérardMarchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., Oliphant, T.E., 2020. Array programming with NumPy. Nature 585, 357–362. https://doi.org/10.1038/s41586-020-2649-2 Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H., 2009. The elements of statistical learning: data mining, inference, and prediction. Springer. 109 Hawryło, P., Bednarz, B., Wężyk, P., Szostak, M., 2018. Estimating defoliation of Scots pine stands using machine learning methods and vegetation indices of Sentinel-2. European Journal of Remote Sensing 51, 194–204. Henkel, R.D., 1995. Segmentation in scale space, in: Hlaváč, V., Šára, R. (Eds.), Computer Analysis of Images and Patterns, Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 41–48. https://doi.org/10.1007/3-540-60268-2_278 Hill, D., Pypker, T., Church, J., 2020. Applications of Unpiloted Aerial Vehicles (UAVs) in Forest Hydrology. Remote Sensing. Huang, C., Geiger, E.L., 2008. Climate anomalies provide opportunities for large-scale mapping of non-native plant abundance in desert grasslands. Diversity and Distributions 14, 875–884. https://doi.org/10.1111/j.1472-4642.2008.00500.x Hummel, R.A., Kimia, B., Zucker, S.W., 1987. Deblurring Gaussian blur. Computer Vision, Graphics, and Image Processing 38, 66–80. https://doi.org/10.1016/S0734189X(87)80153-6 Ikonomakis, M., Kotsiantis, S., Tampakas, V., 2005. Text classification using machine learning techniques. WSEAS transactions on computers 4, 966–974. Ishii, J., Washitani, I., 2013. Early detection of the invasive alien plant Solidago altissima in moist tall grassland using hyperspectral imagery. International Journal of Remote Sensing 34, 5926–5936. https://doi.org/10.1080/01431161.2013.799790 Kalitzin, S.N., ter Haar Romeny, B., Viergever, M., 1997. On topological deep-structure segmentation, in: Proceedings of International Conference on Image Processing. Presented at the Proceedings of International Conference on Image Processing, pp. 863–866 vol.2. https://doi.org/10.1109/ICIP.1997.638633 Kamalov, F., Gurrib, I., Rajab, K., 2021. Financial forecasting with machine learning: price vs return. Kamalov, F., Gurrib, I. & Rajab, K.(2021). Financial Forecasting with Machine Learning: Price Vs Return. Journal of Computer Science 17, 251–264. Kekre, H.B., Thepade, S.D., Sarode, T.K., Suryawanshi, V., 2010. Image Retrieval using Texture Features extracted from GLCM, LBG and KPE. International Journal of Computer Theory and Engineering 2, 695. Kelkar, K.M., Bakal, J.W., 2020. Hyper Parameter Tuning of Random Forest Algorithm for Affective Learning System, in: 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT). Presented at the 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 1192–1195. https://doi.org/10.1109/ICSSIT48917.2020.9214213 Khare, S., Latifi, H., Ghosh, S.K., 2018. Multi-scale assessment of invasive plant species diversity using Pléiades 1A, RapidEye and Landsat-8 data. Geocarto International 33, 681–698. https://doi.org/10.1080/10106049.2017.1289562 Klusowski, J.M., 2018. Complete analysis of a random forest model. arXiv preprint arXiv:1805.02587. Koenderink, J.J., 1984. The structure of images. Biological Cybernetics 50, 363–370. https://doi.org/10.1007/BF00336961 Konlambigue, S., Pothin, J.-B., Honeine, P., Bensrhair, A., 2018. Fast and Accurate Gaussian Pyramid Construction by Extended Box Filtering, in: 2018 26th European Signal Processing Conference (EUSIPCO). Presented at the 2018 26th European Signal 110 Processing Conference (EUSIPCO), pp. 400–404. https://doi.org/10.23919/EUSIPCO.2018.8553321 Kuhn, M., Johnson, K., 2013. Applied predictive modeling. Springer. Kuijper, A., Florack, L.M.J., Viergever, M.A., 2003. Scale Space Hierarchy. Journal of Mathematical Imaging and Vision 18, 169–189. https://doi.org/10.1023/A:1022168617945 Lake, T.A., Briscoe Runquist, R.D., Moeller, D.A., 2022. Deep learning detects invasive plant species across complex landscapes using Worldview-2 and Planetscope satellite imagery. Remote Sensing in Ecology and Conservation 8, 875–889. https://doi.org/10.1002/rse2.288 Lehmann, J.R., Prinz, T., Ziller, S.R., Thiele, J., Heringer, G., Meira-Neto, J.A., Buttschardt, T.K., 2017. Open-source processing and analysis of aerial imagery acquired with a low-cost unmanned aerial system to support invasive plant management. Frontiers in Environmental Science 5, 44. Li, J., Li, D., Zhang, G., Xu, H., Zeng, R., Luo, W., Yu, Y., 2019. Study on extraction of foreign invasive species Mikania micrantha based on unmanned aerial vehicle (UAV) hyperspectral remote sensing, in: Fifth Symposium on Novel Optoelectronic Detection Technology and Application. Presented at the Fifth Symposium on Novel Optoelectronic Detection Technology and Application, SPIE, pp. 597–605. https://doi.org/10.1117/12.2520027 Li, S., Hao, Q., Kang, X., Benediktsson, J.A., 2018. Gaussian Pyramid Based Multiscale Feature Fusion for Hyperspectral Image Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11, 3312–3324. https://doi.org/10.1109/JSTARS.2018.2856741 Lindeberg, T., 2020. Scale Selection, in: Computer Vision: A Reference Guide. Springer International Publishing, Cham, pp. 1–14. https://doi.org/10.1007/978-3-03003243-2_242-1 Lindeberg, T., 2013. Scale-space theory in computer vision. Springer Science & Business Media. Lindeberg, T., 2009. Scale-Space. John Wiley & Sons, pp. 2495–2504. Lindeberg, T., 1995. Direct estimation of affine image deformations using visual front-end operations with automatic scale selection, in: Proceedings of IEEE International Conference on Computer Vision. IEEE, pp. 134–141. Lindeberg, T., 1990. Scale-space for discrete signals. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 234–254. https://doi.org/10.1109/34.49051 Malanson, G.P., Walsh, S.J., 2013. A Geographical Approach to Optimization of Response to Invasive Species, in: Walsh, S.J., Mena, C.F. (Eds.), Science and Conservation in the Galapagos Islands: Frameworks & Perspectives, Social and Ecological Interactions in the Galapagos Islands. Springer, New York, NY, pp. 199–215. https://doi.org/10.1007/978-1-4614-5794-7_12 Mallmann, C., Zaninni, A., Pereira Filho, W., 2020. Vegetation index based in unmanned aerial vehicle (UAV) to improve the management of invasive plants in Protected Areas, Southern Brazil. Presented at the 2020 IEEE Latin American GRSS & ISPRS Remote Sensing Conference (LAGIRS), IEEE, pp. 66–69. 111 Matongera, T.N., Mutanga, O., Dube, T., Sibanda, M., 2017. Detection and mapping the spatial distribution of bracken fern weeds using the Landsat 8 OLI new generation sensor. International Journal of Applied Earth Observation and Geoinformation 57, 93–103. https://doi.org/10.1016/j.jag.2016.12.006 McKinney, W., others, 2010. Data structures for statistical computing in python, in: Proceedings of the 9th Python in Science Conference. Austin, TX, pp. 51–56. Mikolajczyk, K., 2002. Detection of local features invariant to affines transformations (Theses). Institut National Polytechnique de Grenoble - INPG. Mingers, J., 1989. An empirical comparison of pruning methods for decision tree induction. Machine learning 4, 227–243. Misra, P., Yadav, A.S., 2020. Improving the classification accuracy using recursive feature elimination with cross-validation. Int. J. Emerg. Technol 11, 659–665. Mitchell, J.J., Glenn, N.F., 2009. Subpixel abundance estimates in mixture-tuned matched filtering classifications of leafy spurge (Euphorbia esula L.). International Journal of Remote Sensing 30, 6099–6119. https://doi.org/10.1080/01431160902810620 Mohanaiah, P., Sathyanarayana, P., GuruKumar, L., 2013. Image texture feature extraction using GLCM approach. International journal of scientific and research publications 3, 1–5. Mountrakis, G., Im, J., Ogole, C., 2011. Support vector machines in remote sensing: A review. ISPRS Journal of Photogrammetry and Remote Sensing 66, 247–259. https://doi.org/10.1016/j.isprsjprs.2010.11.001 Mpinda Ataky, S.T., de Matos, J., Britto, A. de S., Oliveira, L.E.S., Koerich, A.L., 2020. Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyramid Blending, in: 2020 International Joint Conference on Neural Networks (IJCNN). Presented at the 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. https://doi.org/10.1109/IJCNN48605.2020.9206855 Murphy, K.P., 2012. Machine learning: a probabilistic perspective. MIT press. Ng, W.-T., Meroni, M., Immitzer, M., Böck, S., Leonardi, U., Rembold, F., Gadain, H., Atzberger, C., 2016. Mapping Prosopis spp. with Landsat 8 data in arid environments: Evaluating effectiveness of different methods and temporal imagery selection for Hargeisa, Somaliland. International Journal of Applied Earth Observation and Geoinformation 53, 76–89. https://doi.org/10.1016/j.jag.2016.07.019 Nininahazwe, F., Théau, J., Marc Antoine, G., Varin, M., 2023. Mapping invasive alien plant species with very high spatial resolution and multi-date satellite imagery using object-based and machine learning techniques: A comparative study. GIScience & Remote Sensing 60, 2190203. https://doi.org/10.1080/15481603.2023.2190203 Olkkonen, H., Pesola, P., 1996. Gaussian Pyramid Wavelet Transform for Multiresolution Analysis of Images. Graphical Models and Image Processing 58, 394–398. https://doi.org/10.1006/gmip.1996.0032 O’Mara, F.P., 2012. The role of grasslands in food security and climate change. Annals of botany 110, 1263–1270. 112 Öztürk, Ş., Akdemir, B., 2018. Application of feature extraction and classification methods for histopathological image using GLCM, LBP, LBGLCM, GLRLM and SFTA. Procedia computer science 132, 40–46. Pal, M., 2005. Random forest classifier for remote sensing classification. International Journal of Remote Sensing 26, 217–222. https://doi.org/10.1080/01431160412331269698 Pearlstine, L., Portier, K.M., Smith, S.E., 2005. Textural Discrimination of an Invasive Plant, Schinus terebinthifolius, from Low Altitude Aerial Digital Imagery. Photogrammetric Engineering & Remote Sensing 71, 289–298. https://doi.org/10.14358/PERS.71.3.289 Petropoulos, G.P., Arvanitis, K., Sigrimis, N., 2012. Hyperion hyperspectral imagery analysis combined with machine learning classifiers for land use/cover mapping. Expert systems with Applications 39, 3800–3809. Probst, P., Wright, M.N., Boulesteix, A.-L., 2019. Hyperparameters and tuning strategies for random forest. WIREs Data Mining and Knowledge Discovery 9, e1301. https://doi.org/10.1002/widm.1301 Pyšek, P., Richardson, D.M., 2010. Invasive Species, Environmental Change and Management, and Health. Annual Review of Environment and Resources 35, 25– 55. https://doi.org/10.1146/annurev-environ-033009-095548 Qian, H., Ricklefs, R.E., 2006. The role of exotic species in homogenizing the North American flora. Ecology Letters 9, 1293–1298. https://doi.org/10.1111/j.14610248.2006.00982.x Réti, Z., 1995. Deblurring images blurred by the discrete Gaussian. Applied Mathematics Letters 8, 29–35. https://doi.org/10.1016/0893-9659(95)00042-O Roberti de Siqueira, F., Robson Schwartz, W., Pedrini, H., 2013. Multi-scale gray level cooccurrence matrices for texture description. Neurocomputing, Image Feature Detection and Description 120, 336–345. https://doi.org/10.1016/j.neucom.2012.09.042 Romeny, B.M.H., 2008. Front-End Vision and Multi-Scale Image Analysis: Multi-scale Computer Vision Theory and Applications, written in Mathematica. Springer Science & Business Media. Royimani, L., Mutanga, O., Odindi, J., Dube, T., Matongera, T.N., 2019. Advancements in satellite remote sensing for mapping and monitoring of alien invasive plant species (AIPs). Physics and Chemistry of the Earth, Parts A/B/C, 18th WaterNet/WARFSA/GWPSA Symposium on Integrated Water Resources Development and Management: Innovative Technological Advances for Water Security in Eastern and Southern Africa - Part B 112, 237–245. https://doi.org/10.1016/j.pce.2018.12.004 Rupasinghe, P.A., Chow-Fraser, P., 2021. Mapping Phragmites cover using WorldView 2/3 and Sentinel 2 images at Lake Erie Wetlands, Canada. Biol Invasions 23, 1231– 1247. https://doi.org/10.1007/s10530-020-02432-0 Samson, F., Knopf, F., 1994. Prairie Conservation in North America. BioScience 44, 418– 421. https://doi.org/10.2307/1312365 113 Sebastian, B., Unnikrishnan, A., Balakrishnan, K., 2012. GREY LEVEL CO-OCCURRENCE MATRICES: GENERALISATION AND SOME NEW FEATURES. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT) Vol.2, No.2. Selvaraj, M.G., Vergara, A., Montenegro, F., Ruiz, H.A., Safari, N., Raymaekers, D., Ocimati, W., Ntamwira, J., Tits, L., Omondi, A.B., 2020. Detection of banana plants and their major diseases through aerial images and machine learning methods: A case study in DR Congo and Republic of Benin. ISPRS Journal of Photogrammetry and Remote Sensing 169, 110–124. Shafizadeh-Moghadam, H., Asghari, A., Tayyebi, A., Taleai, M., 2017. Coupling machine learning, tree-based and statistical models with cellular automata to simulate urban growth. Computers, Environment and Urban Systems 64, 297–308. Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., Homayouni, S., 2020. Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13, 6308– 6325. https://doi.org/10.1109/JSTARS.2020.3026724 Shiferaw, H., Bewket, W., Eckert, S., 2019. Performances of machine learning algorithms for mapping fractional cover of an invasive plant species in a dryland ecosystem. Ecology and Evolution 9, 2562–2574. https://doi.org/10.1002/ece3.4919 Singh, J.P., Kuang, Y., Ploughe, L., Coghill, M., Fraser, L.H., 2022. Spotted knapweed (Centaurea stoebe) creates a soil legacy effect by modulating soil elemental composition in a semi-arid grassland ecosystem. Journal of Environmental Management 317, 115391. Sporring, J., Nielsen, M., Florack, L., Johansen, P., 2013. Gaussian scale-space theory. Springer Science & Business Media. Strobl, C., Malley, J., Tutz, G., 2009. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods 14, 323–348. https://doi.org/10.1037/a0016973 Svetnik, V., Liaw, A., Tong, C., Wang, T., 2004. Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. Presented at the Multiple Classifier Systems: 5th International Workshop, MCS 2004, Cagliari, Italy, June 9-11, 2004. Proceedings 5, Springer, pp. 334–343. Tahir, M.A., Roula, M.A., Bouridane, A., Kurugollu, F., Amira, A., 2003. An FPGA based coprocessor for GLCM texture features measurement, in: 10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003. Presented at the 10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003, pp. 1006-1009 Vol.3. https://doi.org/10.1109/ICECS.2003.1301679 Thessen, A., 2016. Adoption of Machine Learning Techniques in Ecology and Earth Science. One Ecosystem 1, e8621. https://doi.org/10.3897/oneeco.1.e8621 Turner, D., Lucieer, A., Watson, C., 2012. An Automated Technique for Generating Georectified Mosaics from Ultra-High Resolution Unmanned Aerial Vehicle (UAV) 114 Imagery, Based on Structure from Motion (SfM) Point Clouds. Remote Sensing 4, 1392–1410. https://doi.org/10.3390/rs4051392 Underwood, E.C., Ustin, S.L., Ramirez, C.M., 2007. A Comparison of Spatial and Spectral Image Resolution for Mapping Invasive Plants in Coastal California. Environmental Management 39, 63–83. https://doi.org/10.1007/s00267-005-0228-9 Ustebay, S., Turgut, Z., Aydin, M.A., 2018. Intrusion Detection System with Recursive Feature Elimination by Using Random Forest and Deep Learning Classifier, in: 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT). Presented at the 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT), pp. 71–76. https://doi.org/10.1109/IBIGDELFT.2018.8625318 Valavi, R., Elith, J., Lahoz-Monfort, J.J., Guillera-Arroita, G., 2021. Modelling species presence-only data with random forests. Ecography 44, 1731–1742. Vapnik, V., 2006. Estimation of dependences based on empirical data. Springer Science & Business Media. Wibawa, M.S., Novianti, K.D.P., 2017. Reduksi fitur untuk optimalisasi klasifikasi tumor payudara berdasarkan data citra FNA. E-Proceedings KNS&I STIKOM Bali 73–78. Witkin, A.P., 1983. Scale-space filtering. Presented at the In Proceedings of the 8th International Joint Conference on Artificial Intelligence, Karlsruhe, Germany, pp. 1019–1023. Wong, T.-T., Yeh, P.-Y., 2020. Reliable Accuracy Estimates from k-Fold Cross Validation. IEEE Transactions on Knowledge and Data Engineering 32, 1586–1594. https://doi.org/10.1109/TKDE.2019.2912815 Xian, G., 2010. An identification method of malignant and benign liver tumors from ultrasonography based on GLCM texture features and fuzzy SVM. Expert Systems with Applications 37, 6737–6741. Yang, C., Everitt, J.H., 2010. Mapping three invasive weeds using airborne hyperspectral imagery. Ecological Informatics, Special Issue on Advances of Ecological Remote Sensing Under Global Change 5, 429–439. https://doi.org/10.1016/j.ecoinf.2010.03.002 Zulpe, N., Pawar, V., 2012. GLCM textural features for brain tumor classification. International Journal of Computer Science Issues (IJCSI) 9, 354. 115