Scholarship & Research
Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/1
Browse
2 results
Search Results
Item Utilizing distributions of variable influence for feature selection in hyperspectral images(Montana State University - Bozeman, College of Engineering, 2019) Walton, Neil Stewart; Chairperson, Graduate Committee: John SheppardOptical sensing has been applied as an important tool in many different domains. Specifically, hyperspectral imaging has enjoyed success in a variety of tasks ranging from plant species classification to ripeness evaluation in produce. Although effective, hyperspectral imaging can be prohibitively expensive to deploy at scale. In the first half of this thesis, we develop a method to assist in designing a low-cost multispectral imager for produce monitoring by using a genetic algorithm (GA) that simultaneously selects a subset of informative wavelengths and identifies effective filter bandwidths for such an imager. Instead of selecting the single fittest member of the final population as our solution, we fit a univariate Gaussian mixture model to a histogram of the overall GA population, selecting the wavelengths associated with the peaks of the distributions as our solution. By evaluating the entire population, rather than a single solution, we are also able to specify filter bandwidths by calculating the standard deviations of the Gaussian distributions and computing the full-width at half-maximum values. In our experiments, we find that this novel histogram-based method for feature selection is effective when compared to both the standard GA and partial least squares discriminant analysis. In the second half of this thesis, we investigate how common feature selection frameworks such as feature ranking, forward selection, and backward elimination break down when faced with the multicollinearity present in hyperspectral data. We then propose two novel algorithms, Variable Importance for Distribution-based Feature Selection (VI-DFS) and Layer-wise Relevance Propagation for Distribution-based Feature Selection (LRP-DFS), that make use of variable importance and feature relevance, respectively. Both methods operate by fitting Gaussian mixture models to the plots of their respective scores over the input wavelengths and select the wavelengths associated with the peaks of each Gaussian component. In our experiments, we find that both novel methods outperform variable ranking, forward selection, and backward elimination and are competitive with the genetic algorithm over all datasets considered.Item Improving a precision agriculture on-farm experimentation workflow through machine learning(Montana State University - Bozeman, College of Engineering, 2019) Peerlinck, Amy; Chairperson, Graduate Committee: John SheppardReducing environmental impact while simultaneously improving net return of crops is one of the key goals of Precision Agriculture (PA). To this end, an on-farm experimentation workflow was created that focuses on reducing the applied nitrogen (N) rate through variable rate application (VRA). The first step in the process, after gathering initial data from the farmers, creates experimental randomly stratified N prescription maps. One of the main concerns that arises for farmers within these maps is the large jumps in N rate between consecutive cells. To this end we successfully develop and apply a Genetic Algorithm to minimize rate jumps while maintaining stratification across yield and protein bins. The ultimate goal of the on-farm experiments is to determine the final N rate to be applied. This is accomplished by optimizing a net return function based on yield and protein prediction. Currently, these predictions are often done with simple linear and non-linear regression models. Our work introduces six different machine learning (ML) models for improving this task: a single layer feed-forward neural network (FFNN), a stacked auto-encoder (SAE), three different AdaBoost ensembles, and a bagging ensemble. The AdaBoost and bagging methods each use a single layer FFNN as its weak model. Furthermore, a simple spatial analysis is performed to create spatial data sets, to better represent the inherent spatial nature of the field data. These methods are applied to four actual fields' yield and protein data. The spatial data is shown to improve accuracy for most yield models. It does not perform as well on the protein data, possibly due to the small size of these data sets, resulting in a sparse data set and potential overfitting of the models. When comparing the predictive models, the deep network performed better than the shallow network, and the ensemble methods outperformed both the SAE and a single FFNN. Out of the four different ensemble methods, bagging had the most consistent performance across the yield and protein data sets. Overall, spatial bagging using FFNNs as the weak learner has the best performance for both yield and protein prediction.