# Scholarly Work - Computer Science

Permanent URI for this collectionhttps://scholarworks.montana.edu/handle/1/3034

## Browse

### Recent Submissions

Item Advancing Retail Predictions: Integrating Diverse Machine Learning Models for Accurate Walmart Sales Forecasting(Sciencedomain International, 2024-06) C., Cyril Neba; F., Gerard Shu; Nsuh, Gillian; A., Philip Amouda; F.. Adrian Neba; Webnda, F.; Ikpe, Victory; Orelaja, Adeyinka; Sylla, Nabintou AnissiaIn the rapidly evolving landscape of retail analytics, the accurate prediction of sales figures holds paramount importance for informed decision-making and operational optimization. Leveraging diverse machine learning methodologies, this study aims to enhance the precision of Walmart sales forecasting, utilizing a comprehensive dataset sourced from Kaggle. Exploratory data analysis reveals intricate patterns and temporal dependencies within the data, prompting the adoption of advanced predictive modeling techniques. Through the implementation of linear regression, ensemble methods such as Random Forest, Gradient Boosting Machines (GBM), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), this research endeavors to identify the most effective approach for predicting Walmart sales. Comparative analysis of model performance showcases the superiority of advanced machine learning algorithms over traditional linear models. The results indicate that XGBoost emerges as the optimal predictor for sales forecasting, boasting the lowest Mean Absolute Error (MAE) of 1226.471, Root Mean Squared Error (RMSE) of 1700.981, and an exceptionally high R-squared value of 0.9999900, indicating near-perfect predictive accuracy. This model's performance significantly surpasses that of simpler models such as linear regression, which yielded an MAE of 35632.510 and an RMSE of 80153.858. Insights from bias and fairness measurements underscore the effectiveness of advanced models in mitigating bias and delivering equitable predictions across temporal segments. Our analysis revealed varying levels of bias across different models. Linear Regression, Multiple Regression, and GLM exhibited moderate bias, suggesting some systematic errors in predictions. Decision Tree showed slightly higher bias, while Random Forest demonstrated a unique scenario of negative bias, implying systematic underestimation of predictions. However, models like GBM, XGBoost, and LGB displayed biases closer to zero, indicating more accurate predictions with minimal systematic errors. Notably, the XGBoost model demonstrated the lowest bias, with an MAE of -7.548432 (Table 4), reflecting its superior ability to minimize prediction errors across different conditions. Additionally, fairness analysis revealed that XGBoost maintained robust performance in both holiday and non-holiday periods, with an MAE of 84273.385 for holidays and 1757.721 for non-holidays. Insights from the fairness measurements revealed that Linear Regression, Multiple Regression, and GLM showed consistent predictive performance across both subgroups. Meanwhile, Decision Tree performed similarly for holiday predictions but exhibited better accuracy for non-holiday sales, whereas, Random Forest, XGBoost, GBM, and LGB models displayed lower MAE values for the non-holiday subgroup, indicating potential fairness issues in predicting holiday sales. The study also highlights the importance of model selection and the impact of advanced machine learning techniques on achieving high predictive accuracy and fairness. Ensemble methods like Random Forest and GBM also showed strong performance, with Random Forest achieving an MAE of 12238.782 and an RMSE of 19814.965, and GBM achieving an MAE of 10839.822 and an RMSE of 1700.981. This research emphasizes the significance of leveraging sophisticated analytics tools to navigate the complexities of retail operations and drive strategic decision-making. By utilizing advanced machine learning models, retailers can achieve more accurate sales forecasts, ultimately leading to better inventory management and enhanced operational efficiency. The study reaffirms the transformative potential of data-driven approaches in driving business growth and innovation in the retail sector.Item Width Helps and Hinders Splitting Flows(Association for Computing Machinery, 2024-01) Cáceres, Manuel; Cairo, Massimo; Grigorjew, Andreas; Khan, Shahbaz; Mumey, Brendan; Rizzi, Romeo; Tomescu, Alexandru I.; Williams, LuciaMinimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow/circulation X on a directed graph G into weighted source-to-sink paths whose weighted sum equals X. We show that, for acyclic graphs, considering the width of the graph (the minimum number of paths needed to cover all of its edges) yields advances in our understanding of its approximability. For the version of the problem that uses only non-negative weights, we identify and characterise a new class of width-stable graphs, for which a popular heuristic is a O(log Val (X))-approximation (Val(X) being the total flow of X), and strengthen its worst-case approximation ratio from Ω(m−−√) to Ω (m/log m) for sparse graphs, where m is the number of edges in the graph. We also study a new problem on graphs with cycles, Minimum Cost Circulation Decomposition (MCCD), and show that it generalises MFD through a simple reduction. For the version allowing also negative weights, we give a (⌈ log ‖ X ‖ ⌉ +1)-approximation (‖ X ‖ being the maximum absolute value of X on any edge) using a power-of-two approach, combined with parity fixing arguments and a decomposition of unitary circulations (‖ X ‖ ≤ 1), using a generalised notion of width for this problem. Finally, we disprove a conjecture about the linear independence of minimum (non-negative) flow decompositions posed by Kloster et al. [2018], but show that its useful implication (polynomial-time assignments of weights to a given set of paths to decompose a flow) holds for the negative version.Item Hyperspectral Band Selection for Multispectral Image Classification with Convolutional Networks(IEEE, 2021-07) Morales, Giorgio; Sheppard, John; Logan, Riley; Shaw, JosephIn recent years, Hyperspectral Imaging (HSI) has become a powerful source for reliable data in applications such as remote sensing, agriculture, and biomedicine. However, hyperspectral images are highly data-dense and often benefit from methods to reduce the number of spectral bands while retaining the most useful information for a specific application. We propose a novel band selection method to select a reduced set of wavelengths, obtained from an HSI system in the context of image classification. Our approach consists of two main steps: the first utilizes a filter-based approach to find relevant spectral bands based on a collinearity analysis between a band and its neighbors. This analysis helps to remove redundant bands and dramatically reduces the search space. The second step applies a wrapper-based approach to select bands from the reduced set based on their information entropy values, and trains a compact Convolutional Neural Network (CNN) to evaluate the performance of the current selection. We present classification results obtained from our method and compare them to other feature selection methods on two hyperspectral image datasets. Additionally, we use the original hyperspectral data cube to simulate the process of using actual filters in a multispectral imager. We show that our method produces more suitable results for a multispectral sensor design.Item Counterfactual Explanations of Neural Network-Generated Response Curves(IEEE, 2023-06) Morales, Giorgio; Sheppard, JohnResponse curves exhibit the magnitude of the response of a sensitive system to a varying stimulus. However, response of such systems may be sensitive to multiple stimuli (i.e., input features) that are not necessarily independent. As a consequence, the shape of response curves generated for a selected input feature (referred to as “active feature”) might depend on the values of the other input features (referred to as “passive features”). In this work we consider the case of systems whose response is approximated using regression neural networks. We propose to use counterfactual explanations (CFEs) for the identification of the features with the highest relevance on the shape of response curves generated by neural network black boxes. CFEs are generated by a genetic algorithm-based approach that solves a multi-objective optimization problem. In particular, given a response curve generated for an active feature, a CFE finds the minimum combination of passive features that need to be modified to alter the shape of the response curve. We tested our method on a synthetic dataset with 1-D inputs and two crop yield prediction datasets with 2-D inputs. The relevance ranking of features and feature combinations obtained on the synthetic dataset coincided with the analysis of the equation that was used to generate the problem. Results obtained on the yield prediction datasets revealed that the impact on fertilizer responsivity of passive features depends on the terrain characteristics of each field.Item Metamorphic Testing For Machine Learning: Applicability, Challenges, and Research Opportunities(IEEE, 2023-07) Rehman, Faqeer Ur; Srinivasan, MadhusudanThe wide adoption and growth of Machine Learning (ML) have made tremendous advancements in revolutionizing a number of fields i.e., manufacturing, transportation, bio-informatics, and self-driving cars. Its ability to extract patterns from a large set of data and then use this knowledge to make future predictions is beyond the human imagination. However, the complex calculations internally performed in them make these systems suffer from the oracle problem; thus, hard to test them for identifying bugs in them and enhancing their quality. An application not properly tested can have disastrous consequences in the production environment. Metamorphic Testing (MT) has been widely accepted by researchers to address the oracle problem in testing both supervised and unsupervised ML-based systems. However, MT has several limitations (when used for testing ML) that the existing literature lacks in capturing them in a centralized place. Applying MT to test ML-based critical systems without prior knowledge/understanding of those limitations can cost organizations a waste of time and resources. In this study, we highlight those limitations to help both the researchers and practitioners to be aware of them for better testing of ML applications. Our efforts result in making the following contributions in this paper, i) providing insights into various challenges faced in testing ML-based solutions, ii) highlighting a number of key challenges faced when applying MT to test ML applications, and iii) presenting the potential future research opportunities/directions for the research community to address them.Item An Empirical Internet Protocol Network Intrusion Detection using Isolation Forest and One-Class Support Vector Machines(The Science and Information Organization, 2023-01) Shu Fuhnwi, Gerard; Adedoyin, Victoria; Agbaje, Janet O.With the increasing reliance on web-based applications and services, network intrusion detection has become a critical aspect of maintaining the security and integrity of computer networks. This study empirically investigates internet protocol network intrusion detection using two machine learning techniques: Isolation Forest (IF) and One-Class Support Vector Machines (OC-SVM), combined with ANOVA F-test feature selection. This paper presents an empirical study comparing the effectiveness of two machine learning algorithms, Isolation Forest (IF) and One-Class Support Vector Machines (OC-SVM), with ANOVA F-test feature selection in detecting network intrusions using web services. The study used the NSL-KDD dataset, encompassing hypertext transfer protocol (HTTP), simple mail transfer protocol (SMTP), and file transfer protocol (FTP) web services attacks and normal traffic patterns, to comprehensively evaluate the algorithms. The performance of the algorithms is evaluated based on several metrics, such as the F1-score, detection rate (recall), precision, false alarm rate (FAR), and Area Under the Receiver Operating Characteristic (AUCROC) curve. Additionally, the study investigates the impact of different hyper-parameters on the performance of both algorithms. Our empirical results demonstrate that while both IF and OC-SVM exhibit high efficacy in detecting network intrusion attacks using web services of type HTTP, SMTP, and FTP, the One-Class Support Vector Machines outperform the Isolation Forest in terms of F1-score (SMTP), detection rate(HTTP, SMTP, and FTP), AUCROC, and a consistent low false alarm rate (HTTP). We used the t-test to determine that OCSVM statistically outperforms IF on DR and FAR.Item Sex Parity in Cognitive Fatigue Model Development for Effective Human-Robot Collaboration(IEEE, 2022-10) Kalatzis, Apostolos; Hopko, Sarah; Mehta, Ranjana K.; Stanley, Laura; Wittie, Mike P.In recent years, robots have become vital to achieving manufacturing competitiveness. Especially in industrial environments, a strong level of interaction is reached when humans and robots form a dynamic system that works together towards achieving a common goal or accomplishing a task. However, the human-robot collaboration can be cognitively demanding, potentially contributing to cognitive fatigue. Therefore, the consideration of cognitive fatigue becomes particularly important to ensure the efficiency and safety in the overall human-robot collaboration. Additionally, sex is an inevitable human factor that needs further investigation for machine learning model development given the perceptual and physiological differences between the sexes in responding to fatigue. As such, this study explored sex differences and labeling strategies in the development of machine learning models for cognitive fatigue detection. Sixteen participants, balanced by sex, recruited to perform a surface finishing task with a UR10 collaborative robot under fatigued and non-fatigued states. Fatigue perception and heart rate activity data collected throughout to create a dataset for cognitive fatigue detection. Equitable machine learning models developed based on perception (survey responses) and condition (fatigue manipulation). The labeling approach had a significant impact on the accuracy and F1-score, where perception-based labels lead to lower accuracy and F1-score for females likely due to sex differences in reporting of fatigue. Additionally, we observed a relationship between heart rate, algorithm type, and labeling approach, where heart rate was the most significant predictor for the two labeling approaches and for all the algorithms utilized. Understanding the implications of label type, algorithm type, and sex on the design of fatigue detection algorithms is essential to designing equitable fatigue-adaptive human-robot collaborations across the sexes.Item Low-frequency Inductive Loop and Its Origin in the Impedance Spectrum of a Graphite Anode(The Electrochemical Society, 2022-11) Thapa, Arun; Gao, HongweiGraphite is a well-known anode material for commercial lithium-ion batteries, and its physical and electrochemical properties have been studied extensively. However, the origin of an inductive loop observed in the low-frequency region of the Nyquist complex plane impedance spectrum of the graphite anode has been widely debated and attributed to contrasting reasons. This paper investigates the impedance spectrum of the graphite anode at various states of charge (SoCs) using three-electrode galvanostatic Electrochemical Impedance Spectroscopy (EIS) and further explores the impedance response of the electrolyte as a function of frequency. The graphite anode EIS measurement displayed an inductive loop in the low-frequency region for almost entire SoCs, irrespective of the solid electrolyte interphase (SEI) age. To study the origin of this inductive loop in the graphite impedance spectrum, we fabricated a three-electrode pouch cell with graphite and NMC electrodes and estimated the electrolyte impedance in the frequency range from 1 MHz to 0.05 Hz. The electrolyte impedance at low frequencies exhibited inductive behavior, indicating a significant role of the electrolyte in the origin of the inductive characteristic in the low-frequency region of the graphite EIS spectrum.Item Improved Yield Prediction of Winter Wheat Using a Novel Two-Dimensional Deep Regression Neural Network Trained via Remote Sensing(MDPI AG, 2023-01) Morales, Giorgio; Sheppard, John W.; Hedgedus, Paul B.; Maxwell, Bruce D.In recent years, the use of remotely sensed and on-ground observations of crop fields, in conjunction with machine learning techniques, has led to highly accurate crop yield estimations. In this work, we propose to further improve the yield prediction task by using Convolutional Neural Networks (CNNs) given their unique ability to exploit the spatial information of small regions of the field. We present a novel CNN architecture called Hyper3DNetReg that takes in a multi-channel input raster and, unlike previous approaches, outputs a two-dimensional raster, where each output pixel represents the predicted yield value of the corresponding input pixel. Our proposed method then generates a yield prediction map by aggregating the overlapping yield prediction patches obtained throughout the field. Our data consist of a set of eight rasterized remotely-sensed features: nitrogen rate applied, precipitation, slope, elevation, topographic position index (TPI), aspect, and two radar backscatter coefficients acquired from the Sentinel-1 satellites. We use data collected during the early stage of the winter wheat growing season (March) to predict yield values during the harvest season (August). We present leave-one-out cross-validation experiments for rain-fed winter wheat over four fields and show that our proposed methodology produces better predictions than five compared methods, including Bayesian multiple linear regression, standard multiple linear regression, random forest, an ensemble of feedforward networks using AdaBoost, a stacked autoencoder, and two other CNN architectures.Item Improving RNA Assembly via Safety and Completeness in Flow Decompositions(Mary Ann Liebert Inc, 2022-12) Khan, Shahbaz; Kortelainen, Milla; Cáceres, Manuel; Williams, Lucia; Tomescu, Alexandru I.Decomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to edge deletion, or length of the longest path. However, in many bioinformatic applications, we seek a specific decomposition where the paths correspond to some underlying data that generated the flow. In these cases, no optimization criteria guarantee the identification of the correct decomposition. Therefore, we propose to instead report the safe paths, which are subpaths of at least one path in every flow decomposition. In this work, we give the first local characterization of safe paths for flow decompositions in directed acyclic graphs, leading to a practical algorithm for finding the complete set of safe paths. In addition, we evaluate our algorithm on RNA transcript data sets against a trivial safe algorithm (extended unitigs), the recently proposed safe paths for path covers (TCBB 2021) and the popular heuristic greedy-width. On the one hand, we found that besides maintaining perfect precision, our safe and complete algorithm reports a significantly higher coverage ( = 50% more) compared with the other safe algorithms. On the other hand, the greedy-width algorithm although reporting a better coverage, it also reports a significantly lower precision on complex graphs (for genes expressing a large number of transcripts). Overall, our safe and complete algorithm outperforms (by = 20%) greedy-width on a unified metric (F-score) considering both coverage and precision when the evaluated data set has a significant number of complex graphs. Moreover, it also has a superior time (4 - 5x) and space performance (1.2 - 2.2x), resulting in a better and more practical approach for bioinformatic applications of flow decomposition.Item Efficient Minimum Flow Decomposition via Integer Linear Programming(Mary Ann Liebert Inc, 2022-11) Dias, Fernando H.C.; Williams, Lucia; Mumey, Brendan; Tomescu, Alexandru I.Minimum flow decomposition (MFD) is an NP-hard problem asking to decompose a network flow into a minimum set of paths (together with associated weights). Variants of it are powerful models in multiassembly problems in Bioinformatics, such as RNA assembly. Owing to its hardness, practical multiassembly tools either use heuristics or solve simpler, polynomial time-solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Here, we provide the first fast and exact solver for MFD on acyclic flow networks, based on Integer Linear Programming (ILP). Key to our approach is an encoding of all the exponentially many solution paths using only a quadratic number of variables. We also extend our ILP formulation to many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. On both simulated and real-flow splicing graphs, our approach solves any instance in <13 seconds. We hope that our formulations can lie at the core of future practical RNA assembly tools. Our implementations are freely available on Github.Item Computing the Tandem Duplication Distance is NP-Hard(Society for Industrial & Applied Mathematics, 2022-03) Lafond, Manuel; Zhu, Binhai; Zou, PengIn computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment---this can be represented as the string operation AXB⇒AXXB. Tandem exon duplications have been found in many species such as human, fly, and worm and have been largely studied in computational biology. The tandem duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet Σ, compute the smallest sequence of TDs required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that TDs have received much attention ever since. In this paper, we focus on the special case when all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. We first prove that this problem is NP-hard when the alphabet size is unbounded, settling the 16-year-old open problem. We then show how to adapt the proof to |Σ|=4, hence proving the NP-hardness of the TD problem for any |Σ|≥4. One of the tools we develop for the reduction is a new problem called Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.Item Dispersing and grouping points on planar segments(Elsevier BV, 2022-09) He, Xiaozhou; Lai, Wenfeng; Zhu, Binhai; Zou, PengMotivated by (continuous) facility location, we study the problem of dispersing and grouping points on a set of segments (of streets) in the plane. In the former problem, given a set of n disjoint line segments in the plane, we investigate the problem of computing a point on each of the n segments such that the minimum Euclidean distance between any two of these points is maximized. We prove that this 2D dispersion problem is NP-hard, in fact, it is NP-hard even if all the segments are parallel and are of unit length. This is in contrast to the polynomial solvability of the corresponding 1D problem by Li and Wang (2016), where the intervals are in 1D and are all disjoint. With this result, we also show that the Independent Set problem on Colored Linear Unit Disk Graph (meaning the convex hulls of points with the same color form disjoint line segments) remains NP-hard, and the parameterized version of it is in W[2]. In the latter problem, given a set of n disjoint line segments in the plane we study the problem of computing a point on each of the n segments such that the maximum Euclidean distance between any two of these points is minimized. We present a factor-1.1547 approximation algorithm which runs in time. Our results can be generalized to the Manhattan distance.Item Gray Spectralon polarized reflectance deviations from Lambertian(SPIE, 2022-06) Field, Nathaniel J.; Brown, Jarrod P.; Card, Darrel B.; Welsh, Chad M.; Van Rynbach, Andre J.; Shaw, Joseph A.While Spectralon panels are largely assumed to be ideal Lambertian surfaces, their actual polarized reflective responses deviate from the ideal by at least a small amount at illumination and viewing angles off surface normal. The Mueller matrix response of four different panels between 10% and 99% reflectance were measured and the radiometric response from two distinct monostatic or nearmonostatic polarimeter systems are compared, one at Montana State University and one at the Air Force Research Lab. The deviations from an assumed ideal Lambertian surface are reported.Item Designing multi-phased CO2 capture and storage infrastructure deployments(Elsevier BV, 2022-08) Jones, Erick C.; Yaw, Sean; Bennett, Jeffrey A.; Ogland-Hand, Jonathan D.; Strahan, Cooper; Middleton, Richard S.CO2 capture and storage (CCS) is a climate change mitigation strategy aimed at reducing the amount of CO vented into the atmosphere by capturing CO emissions from industrial sources, transporting the CO via a dedicated pipeline network, and injecting it into geologic reservoirs. Designing CCS infrastructure is a complex problem requiring concurrent optimization of source selection, reservoir selection, and pipeline routing decisions. Current CCS infrastructure design methods assume that project parameters including costs, capacities, and availability, remain constant throughout the project’s lifespan. In this research, we introduce a novel, multi-phased, CCS infrastructure design model that allows for analysis of more complex scenarios that allow for variations in project parameters across distinct phases. We demonstrate the efficacy of our approach with theoretical analysis and an evaluation using real CCS infrastructure data.Item Safety in multi-assembly via paths appearing in all path covers of a DAG(Institute of Electrical and Electronics Engineers, 2021-01) Caceres, Manuel; Mumey, Brendan; Husic, Edin; Rizzi, Romeo; Cairo, Massimo; Sahlin, Kristoffer; Tomescu, Alexandru I. IoanA multi-assembly problem asks to reconstruct multiple genomic sequences from mixed reads sequenced from all of them. Standard formulations of such problems model a solution as a path cover in a directed acyclic graph, namely a set of paths that together cover all vertices of the graph. Since multi-assembly problems admit multiple solutions in practice, we consider an approach commonly used in standard genome assembly: output only partial solutions (contigs, or safe paths), that appear in all path cover solutions. We study constrained path covers, a restriction on the path cover solution that incorporate practical constraints arising in multi-assembly problems. We give efficient algorithms finding all maximal safe paths for constrained path covers. We compute the safe paths of splicing graphs constructed from transcript annotations of different species. Our algorithms run in less than 15 seconds per species and report RNA contigs that are over 99% precise and are up to 8 times longer than unitigs. Moreover, RNA contigs cover over 70% of the transcripts and their coding sequences in most cases. With their increased length to unitigs, high precision, and fast construction time, maximal safe paths can provide a better base set of sequences for transcript assembly programs.Item Flow Decomposition with Subpath Constraints(Institute of Electrical and Electronics Engineers, 2022-01) Williams, Lucia; Tomescu, Alexandru I. loan; Mumey, BrendanFlow network decomposition is a natural model for problems where we are given a flow network arising from superimposing a set of weighted paths and would like to recover the underlying data, i.e.,decompose the flow into the original paths and their weights. Thus, variations on flow decomposition are often used as subroutines in multiassembly problems such as RNA transcript assembly. In practice, we frequently have access to information beyond flow values in the form of subpaths, and many tools incorporate these heuristically. But despite acknowledging their utility in practice, previous work has not formally addressed the effect of subpath constraints on the accuracy of flow network decomposition approaches. We formalize the flow decomposition with subpath constraints problem, give the first algorithms for it, and study its usefulness for recovering ground truth decompositions. For finding a minimum decomposition, we propose both a heuristic and an FPT algorithm. Experiments on RNA transcript datasets show that for instances with larger solution path sets, the addition of subpath constraints finds 13% more ground truth solutions when minimal decompositions are found exactly, and 30% more ground truth solutions when minimal decompositions are found heuristically.Item Scalable Algorithms for Designing CO2 Capture and Storage Infrastructure(Springer Science and Business Media LLC, 2022) Whitman, Caleb; Yaw, Sean; Hoover, Brendan; Middleton, RichardCO2 capture and storage (CCS) is a climate change mitigation strategy that aims to reduce the amount of CO2 vented into the atmosphere from industrial processes. Designing cost-effective CCS infrastructure is critical in meeting CO2 emission reduction targets and is a computationally challenging problem. We formalize the computational problem of designing cost-effective CCS infrastructure and detail the fundamental intractability of designing CCS infrastructure as problem instances grow in size. We explore the problem’s relationship to the ecosystem of network design problems, and introduce three novel algorithms for its solution. We evaluate our proposed algorithms against existing exact approaches for CCS infrastructure design and find that they all run in dramatically less time than the exact approaches and generate solutions that are very close to optimal. Decreasing the time it takes to determine CCS infrastructure designs will support national-level scenario analysis, undertaking risk and sensitivity assessments, and understanding the impact of government policies (e.g., tax credits for CCS).Item Reduced-cost hyperspectral convolutional neural networks(2020-09) Morales, Giorgio; Sheppard, John W.; Scherrer, Bryan; Shaw, Joseph A.Hyperspectral imaging provides a useful tool for extracting complex information when visual spectral bands are not enough to solve certain tasks. However, processing hyperspectral images (HSIs) is usually computationally expensive due to the great amount of both spatial and spectral data they incorporate. We present a low-cost convolutional neural network designed for HSI classification. Its architecture consists of two parts: a series of densely connected three-dimensional (3-D) convolutions used as a feature extractor, and a series of two-dimensional (2-D) separable convolutions used as a spatial encoder. We show that this design involves fewer trainable parameters compared to other approaches, yet without detriment to its performance. What is more, we achieve comparable state-of-the-art results testing our architecture on four public remote sensing datasets: Indian Pines, Pavia University, Salinas, and EuroSAT; and a dataset of Kochia leaves [Bassia scoparia] with three different levels of herbicide resistance. The source code and datasets are available online.Item Hyperspectral Band Selection for Multispectral Image Classification with Convolutional Networks(2021) Morales, Giorgio; Sheppard, John W.; Logan, Riley D.; Shaw, Joseph A.In recent years, Hyperspectral Imaging (HSI) has become a powerful source for reliable data in applications such as remote sensing, agriculture, and biomedicine. However, hyperspectral images are highly data-dense and often benefit from methods to reduce the number of spectral bands while retaining the most useful information for a specific application. We propose a novel band selection method to select a reduced set of wavelengths, obtained from an HSI system in the context of image classification. Our approach consists of two main steps: the first utilizes a filter-based approach to find relevant spectral bands based on a collinearity analysis between a band and its neighbors. This analysis helps to remove redundant bands and dramatically reduces the search space. The second step applies a wrapper-based approach to select bands from the reduced set based on their information entropy values, and trains a compact Convolutional Neural Network (CNN) to evaluate the performance of the current selection. We present classification results obtained from our method and compare them to other feature selection methods on two hyperspectral image datasets. Additionally, we use the original hyperspectral data cube to simulate the process of using actual filters in a multispectral imager. We show that our method produces more suitable results for a multispectral sensor design.

- «
- 1 (current)
- 2
- 3
- »