Scholarly Work - Computer Science
Permanent URI for this collectionhttps://scholarworks.montana.edu/handle/1/3034
Browse
57 results
Search Results
Item Hyperspectral Band Selection for Multispectral Image Classification with Convolutional Networks(IEEE, 2021-07) Morales, Giorgio; Sheppard, John; Logan, Riley; Shaw, JosephIn recent years, Hyperspectral Imaging (HSI) has become a powerful source for reliable data in applications such as remote sensing, agriculture, and biomedicine. However, hyperspectral images are highly data-dense and often benefit from methods to reduce the number of spectral bands while retaining the most useful information for a specific application. We propose a novel band selection method to select a reduced set of wavelengths, obtained from an HSI system in the context of image classification. Our approach consists of two main steps: the first utilizes a filter-based approach to find relevant spectral bands based on a collinearity analysis between a band and its neighbors. This analysis helps to remove redundant bands and dramatically reduces the search space. The second step applies a wrapper-based approach to select bands from the reduced set based on their information entropy values, and trains a compact Convolutional Neural Network (CNN) to evaluate the performance of the current selection. We present classification results obtained from our method and compare them to other feature selection methods on two hyperspectral image datasets. Additionally, we use the original hyperspectral data cube to simulate the process of using actual filters in a multispectral imager. We show that our method produces more suitable results for a multispectral sensor design.Item Counterfactual Explanations of Neural Network-Generated Response Curves(IEEE, 2023-06) Morales, Giorgio; Sheppard, JohnResponse curves exhibit the magnitude of the response of a sensitive system to a varying stimulus. However, response of such systems may be sensitive to multiple stimuli (i.e., input features) that are not necessarily independent. As a consequence, the shape of response curves generated for a selected input feature (referred to as “active feature”) might depend on the values of the other input features (referred to as “passive features”). In this work we consider the case of systems whose response is approximated using regression neural networks. We propose to use counterfactual explanations (CFEs) for the identification of the features with the highest relevance on the shape of response curves generated by neural network black boxes. CFEs are generated by a genetic algorithm-based approach that solves a multi-objective optimization problem. In particular, given a response curve generated for an active feature, a CFE finds the minimum combination of passive features that need to be modified to alter the shape of the response curve. We tested our method on a synthetic dataset with 1-D inputs and two crop yield prediction datasets with 2-D inputs. The relevance ranking of features and feature combinations obtained on the synthetic dataset coincided with the analysis of the equation that was used to generate the problem. Results obtained on the yield prediction datasets revealed that the impact on fertilizer responsivity of passive features depends on the terrain characteristics of each field.Item Metamorphic Testing For Machine Learning: Applicability, Challenges, and Research Opportunities(IEEE, 2023-07) Rehman, Faqeer Ur; Srinivasan, MadhusudanThe wide adoption and growth of Machine Learning (ML) have made tremendous advancements in revolutionizing a number of fields i.e., manufacturing, transportation, bio-informatics, and self-driving cars. Its ability to extract patterns from a large set of data and then use this knowledge to make future predictions is beyond the human imagination. However, the complex calculations internally performed in them make these systems suffer from the oracle problem; thus, hard to test them for identifying bugs in them and enhancing their quality. An application not properly tested can have disastrous consequences in the production environment. Metamorphic Testing (MT) has been widely accepted by researchers to address the oracle problem in testing both supervised and unsupervised ML-based systems. However, MT has several limitations (when used for testing ML) that the existing literature lacks in capturing them in a centralized place. Applying MT to test ML-based critical systems without prior knowledge/understanding of those limitations can cost organizations a waste of time and resources. In this study, we highlight those limitations to help both the researchers and practitioners to be aware of them for better testing of ML applications. Our efforts result in making the following contributions in this paper, i) providing insights into various challenges faced in testing ML-based solutions, ii) highlighting a number of key challenges faced when applying MT to test ML applications, and iii) presenting the potential future research opportunities/directions for the research community to address them.Item An Empirical Internet Protocol Network Intrusion Detection using Isolation Forest and One-Class Support Vector Machines(The Science and Information Organization, 2023-01) Shu Fuhnwi, Gerard; Adedoyin, Victoria; Agbaje, Janet O.With the increasing reliance on web-based applications and services, network intrusion detection has become a critical aspect of maintaining the security and integrity of computer networks. This study empirically investigates internet protocol network intrusion detection using two machine learning techniques: Isolation Forest (IF) and One-Class Support Vector Machines (OC-SVM), combined with ANOVA F-test feature selection. This paper presents an empirical study comparing the effectiveness of two machine learning algorithms, Isolation Forest (IF) and One-Class Support Vector Machines (OC-SVM), with ANOVA F-test feature selection in detecting network intrusions using web services. The study used the NSL-KDD dataset, encompassing hypertext transfer protocol (HTTP), simple mail transfer protocol (SMTP), and file transfer protocol (FTP) web services attacks and normal traffic patterns, to comprehensively evaluate the algorithms. The performance of the algorithms is evaluated based on several metrics, such as the F1-score, detection rate (recall), precision, false alarm rate (FAR), and Area Under the Receiver Operating Characteristic (AUCROC) curve. Additionally, the study investigates the impact of different hyper-parameters on the performance of both algorithms. Our empirical results demonstrate that while both IF and OC-SVM exhibit high efficacy in detecting network intrusion attacks using web services of type HTTP, SMTP, and FTP, the One-Class Support Vector Machines outperform the Isolation Forest in terms of F1-score (SMTP), detection rate(HTTP, SMTP, and FTP), AUCROC, and a consistent low false alarm rate (HTTP). We used the t-test to determine that OCSVM statistically outperforms IF on DR and FAR.Item Sex Parity in Cognitive Fatigue Model Development for Effective Human-Robot Collaboration(IEEE, 2022-10) Kalatzis, Apostolos; Hopko, Sarah; Mehta, Ranjana K.; Stanley, Laura; Wittie, Mike P.In recent years, robots have become vital to achieving manufacturing competitiveness. Especially in industrial environments, a strong level of interaction is reached when humans and robots form a dynamic system that works together towards achieving a common goal or accomplishing a task. However, the human-robot collaboration can be cognitively demanding, potentially contributing to cognitive fatigue. Therefore, the consideration of cognitive fatigue becomes particularly important to ensure the efficiency and safety in the overall human-robot collaboration. Additionally, sex is an inevitable human factor that needs further investigation for machine learning model development given the perceptual and physiological differences between the sexes in responding to fatigue. As such, this study explored sex differences and labeling strategies in the development of machine learning models for cognitive fatigue detection. Sixteen participants, balanced by sex, recruited to perform a surface finishing task with a UR10 collaborative robot under fatigued and non-fatigued states. Fatigue perception and heart rate activity data collected throughout to create a dataset for cognitive fatigue detection. Equitable machine learning models developed based on perception (survey responses) and condition (fatigue manipulation). The labeling approach had a significant impact on the accuracy and F1-score, where perception-based labels lead to lower accuracy and F1-score for females likely due to sex differences in reporting of fatigue. Additionally, we observed a relationship between heart rate, algorithm type, and labeling approach, where heart rate was the most significant predictor for the two labeling approaches and for all the algorithms utilized. Understanding the implications of label type, algorithm type, and sex on the design of fatigue detection algorithms is essential to designing equitable fatigue-adaptive human-robot collaborations across the sexes.Item Low-frequency Inductive Loop and Its Origin in the Impedance Spectrum of a Graphite Anode(The Electrochemical Society, 2022-11) Thapa, Arun; Gao, HongweiGraphite is a well-known anode material for commercial lithium-ion batteries, and its physical and electrochemical properties have been studied extensively. However, the origin of an inductive loop observed in the low-frequency region of the Nyquist complex plane impedance spectrum of the graphite anode has been widely debated and attributed to contrasting reasons. This paper investigates the impedance spectrum of the graphite anode at various states of charge (SoCs) using three-electrode galvanostatic Electrochemical Impedance Spectroscopy (EIS) and further explores the impedance response of the electrolyte as a function of frequency. The graphite anode EIS measurement displayed an inductive loop in the low-frequency region for almost entire SoCs, irrespective of the solid electrolyte interphase (SEI) age. To study the origin of this inductive loop in the graphite impedance spectrum, we fabricated a three-electrode pouch cell with graphite and NMC electrodes and estimated the electrolyte impedance in the frequency range from 1 MHz to 0.05 Hz. The electrolyte impedance at low frequencies exhibited inductive behavior, indicating a significant role of the electrolyte in the origin of the inductive characteristic in the low-frequency region of the graphite EIS spectrum.Item Improved Yield Prediction of Winter Wheat Using a Novel Two-Dimensional Deep Regression Neural Network Trained via Remote Sensing(MDPI AG, 2023-01) Morales, Giorgio; Sheppard, John W.; Hedgedus, Paul B.; Maxwell, Bruce D.In recent years, the use of remotely sensed and on-ground observations of crop fields, in conjunction with machine learning techniques, has led to highly accurate crop yield estimations. In this work, we propose to further improve the yield prediction task by using Convolutional Neural Networks (CNNs) given their unique ability to exploit the spatial information of small regions of the field. We present a novel CNN architecture called Hyper3DNetReg that takes in a multi-channel input raster and, unlike previous approaches, outputs a two-dimensional raster, where each output pixel represents the predicted yield value of the corresponding input pixel. Our proposed method then generates a yield prediction map by aggregating the overlapping yield prediction patches obtained throughout the field. Our data consist of a set of eight rasterized remotely-sensed features: nitrogen rate applied, precipitation, slope, elevation, topographic position index (TPI), aspect, and two radar backscatter coefficients acquired from the Sentinel-1 satellites. We use data collected during the early stage of the winter wheat growing season (March) to predict yield values during the harvest season (August). We present leave-one-out cross-validation experiments for rain-fed winter wheat over four fields and show that our proposed methodology produces better predictions than five compared methods, including Bayesian multiple linear regression, standard multiple linear regression, random forest, an ensemble of feedforward networks using AdaBoost, a stacked autoencoder, and two other CNN architectures.Item Improving RNA Assembly via Safety and Completeness in Flow Decompositions(Mary Ann Liebert Inc, 2022-12) Khan, Shahbaz; Kortelainen, Milla; Cáceres, Manuel; Williams, Lucia; Tomescu, Alexandru I.Decomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to edge deletion, or length of the longest path. However, in many bioinformatic applications, we seek a specific decomposition where the paths correspond to some underlying data that generated the flow. In these cases, no optimization criteria guarantee the identification of the correct decomposition. Therefore, we propose to instead report the safe paths, which are subpaths of at least one path in every flow decomposition. In this work, we give the first local characterization of safe paths for flow decompositions in directed acyclic graphs, leading to a practical algorithm for finding the complete set of safe paths. In addition, we evaluate our algorithm on RNA transcript data sets against a trivial safe algorithm (extended unitigs), the recently proposed safe paths for path covers (TCBB 2021) and the popular heuristic greedy-width. On the one hand, we found that besides maintaining perfect precision, our safe and complete algorithm reports a significantly higher coverage ( = 50% more) compared with the other safe algorithms. On the other hand, the greedy-width algorithm although reporting a better coverage, it also reports a significantly lower precision on complex graphs (for genes expressing a large number of transcripts). Overall, our safe and complete algorithm outperforms (by = 20%) greedy-width on a unified metric (F-score) considering both coverage and precision when the evaluated data set has a significant number of complex graphs. Moreover, it also has a superior time (4 - 5x) and space performance (1.2 - 2.2x), resulting in a better and more practical approach for bioinformatic applications of flow decomposition.Item Efficient Minimum Flow Decomposition via Integer Linear Programming(Mary Ann Liebert Inc, 2022-11) Dias, Fernando H.C.; Williams, Lucia; Mumey, Brendan; Tomescu, Alexandru I.Minimum flow decomposition (MFD) is an NP-hard problem asking to decompose a network flow into a minimum set of paths (together with associated weights). Variants of it are powerful models in multiassembly problems in Bioinformatics, such as RNA assembly. Owing to its hardness, practical multiassembly tools either use heuristics or solve simpler, polynomial time-solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Here, we provide the first fast and exact solver for MFD on acyclic flow networks, based on Integer Linear Programming (ILP). Key to our approach is an encoding of all the exponentially many solution paths using only a quadratic number of variables. We also extend our ILP formulation to many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. On both simulated and real-flow splicing graphs, our approach solves any instance in <13 seconds. We hope that our formulations can lie at the core of future practical RNA assembly tools. Our implementations are freely available on Github.Item Computing the Tandem Duplication Distance is NP-Hard(Society for Industrial & Applied Mathematics, 2022-03) Lafond, Manuel; Zhu, Binhai; Zou, PengIn computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment---this can be represented as the string operation AXB⇒AXXB. Tandem exon duplications have been found in many species such as human, fly, and worm and have been largely studied in computational biology. The tandem duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet Σ, compute the smallest sequence of TDs required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that TDs have received much attention ever since. In this paper, we focus on the special case when all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. We first prove that this problem is NP-hard when the alphabet size is unbounded, settling the 16-year-old open problem. We then show how to adapt the proof to |Σ|=4, hence proving the NP-hardness of the TD problem for any |Σ|≥4. One of the tools we develop for the reduction is a new problem called Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.