Utilizing distributions of variable influence for feature selection in hyperspectral images
Walton, Neil Stewart
MetadataShow full item record
Optical sensing has been applied as an important tool in many different domains. Specifically, hyperspectral imaging has enjoyed success in a variety of tasks ranging from plant species classification to ripeness evaluation in produce. Although effective, hyperspectral imaging can be prohibitively expensive to deploy at scale. In the first half of this thesis, we develop a method to assist in designing a low-cost multispectral imager for produce monitoring by using a genetic algorithm (GA) that simultaneously selects a subset of informative wavelengths and identifies effective filter bandwidths for such an imager. Instead of selecting the single fittest member of the final population as our solution, we fit a univariate Gaussian mixture model to a histogram of the overall GA population, selecting the wavelengths associated with the peaks of the distributions as our solution. By evaluating the entire population, rather than a single solution, we are also able to specify filter bandwidths by calculating the standard deviations of the Gaussian distributions and computing the full-width at half-maximum values. In our experiments, we find that this novel histogram-based method for feature selection is effective when compared to both the standard GA and partial least squares discriminant analysis. In the second half of this thesis, we investigate how common feature selection frameworks such as feature ranking, forward selection, and backward elimination break down when faced with the multicollinearity present in hyperspectral data. We then propose two novel algorithms, Variable Importance for Distribution-based Feature Selection (VI-DFS) and Layer-wise Relevance Propagation for Distribution-based Feature Selection (LRP-DFS), that make use of variable importance and feature relevance, respectively. Both methods operate by fitting Gaussian mixture models to the plots of their respective scores over the input wavelengths and select the wavelengths associated with the peaks of each Gaussian component. In our experiments, we find that both novel methods outperform variable ranking, forward selection, and backward elimination and are competitive with the genetic algorithm over all datasets considered.