Theses and Dissertations at Montana State University (MSU)
Permanent URI for this collectionhttps://scholarworks.montana.edu/handle/1/733
Browse
36 results
Search Results
Item String analysis and algorithms with genomic applications(Montana State University - Bozeman, College of Engineering, 2024) Liyana Ralalage, Adiesha Lakshan Liyanage; Chairperson, Graduate Committee: Binhai ZhuIn biology, genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Understanding how genome rearrangements occur in a genome can help us to understand the evolutionary history of extant species, improve genetic engineering, and understand the basis of genetic diseases. In this dissertation, we explored four problems related to genome partitioning and tandem duplication and deletion rearrangement operations. Our interest was focused on determining how difficult it is to solve these problems and identifying efficient algorithms to solve them. The proposed problems were formulated as string problems and then analyzed using complexity theory. In the first chapter, we explored several variations of F -strip recovery problem called XSR-F and GSR-F and their complexity under different parameters. We proved that the XSR-F problem is hard to solve unless we restrict the allowed block sizes to one size. We provided a polynomial time algorithm for GSR-F under a fixed alphabet and fixed F . In the second and third chapters, we introduced two string problems named longest letter- duplicated subsequence (LLDS) and longest subsequence-repeated subsequence (LSRS)-- formulated as alternative problem formulations for the tandem-duplication distance problem that allow to extract information about segments of genes that may have undergone tandem duplication-- analyzed the complexity of their variations and devised efficient algorithms to solve them. We proved that constrained versions of LLDS and LSRS problems are NP- hard for parameter d > or = 4, while general versions were polynomially solvable which hints that any variations closer to the original tandem duplication distance problem are still hard to solve. In the final chapter, we delved into two heuristic algorithms designed to compute genomic distance between two mitochondrial genomes and a heuristic algorithm to predict ancestral gene order under the TDRL (tandem-duplication random loss) model. We improved the previously studied method developed for permutation strings by tweaking heuristic choices aimed at calculating the minimum distance between two genomes to apply to non-permutation strings. These heuristic algorithms were implemented and tested on a real-world mitochondrial genome data set.Item Improving the effectiveness of metamorphic testing using systematic test case generation(Montana State University - Bozeman, College of Engineering, 2024) Saha, Prashanta; Chairperson, Graduate Committee: Clemente Izurieta; This is a manuscript style paper that includes co-authored chapters.Metamorphic testing is a well-known approach to tackle the oracle problem in software testing. This technique requires source test cases that serve as seeds for the generation of follow-up test cases. Systematic design of test cases is crucial for the test quality. Thus, source test case generation strategy can make a big impact on the fault detection effectiveness of metamorphic testing. Most of the previous studies on metamorphic testing have used either random test data or existing test cases as source test cases. There has been limited research done on systematic source test case generation for metamorphic testing. This thesis explores innovative methods for enhancing the effectiveness of Metamorphic Testing through systematic generation of source test cases. It addresses the challenge of testing complex software systems, including numerical programs and machine learning applications, where traditional testing methods are limited by the absence of a reliable oracle. By focusing on structural, mutation coverage criteria, and characteristics of machine learning datasets, the research introduces strategies to generate source test cases that are more effective in fault detection compared to random test case generation. The proposed techniques include leveraging structural and mutation coverage for numerical programs and aligning random values with machine learning properties for supervised classifier applications. These techniques are integrated into the METTester tool, automating the process and potentially reducing testing costs by minimizing the test suite without sacrificing quality. The thesis demonstrates that tailored source test case generation can significantly improve the fault detection capabilities of Metamorphic Testing, offering substantial benefits in terms of cost efficiency and reliability in software testing.Item From curves to words and back again: geometric computation of minimum-area homotopy(Montana State University - Bozeman, College of Engineering, 2024) McCoy, Bradley Allen; Chairperson, Graduate Committee: Brittany FasyLet gamma be a generic closed curve in the plane. The area of a homotopy is the area swept by the homotopy. We consider the problem of computing the minimum null-homotopy area of gamma. Samuel Blank, in his 1967 Ph.D. thesis, determined if gamma is self-overlapping by geometrically constructing a combinatorial word from gamma. More recently, Zipei Nie, in an unpublished manuscript, computed the minimum homotopy area of gamma by constructing a combinatorial word algebraically. We provide a unified framework for working with both words and determine the settings under which Blank's word and Nie's word are equivalent. Using this equivalence, we give a new geometric proof for the correctness of Nie's algorithm. Unlike previous work, our proof is constructive which allows us to naturally compute the actual homotopy that realizes the minimum area. Furthermore, we contribute to the theory of self-overlapping curves by providing the first polynomial-time algorithm to compute a self-overlapping decomposition of any closed curve gamma with minimum area. Next, we describe the first polynomial implementation of an algorithm to compute the minimum homotopy area of a piecewise linear closed curve in the plane. We discuss how minimum homotopy area can be used as a similarity measure for curves and include experiments that compare the runtime of our algorithm to an implementation of the Frechet distance. We then extend our algorithm for computing the minimum homotopy area in the plane to homotopic, non-intersecting, non-contractible curves on an orientable surface with positive genus. Finally, we consider the inverse problem of determining which combinatorial Blank words correspond to closed curves in the plane. We solve a special case of this problem and give an exponential algorithm to the general case.Item Using software bill of materials for software supply chain security and its generation impact on vulnerability detection(Montana State University - Bozeman, College of Engineering, 2024) O'Donoghue, Eric Jeffery; Chairperson, Graduate Committee: Clemente Izurieta; This is a manuscript style paper that includes co-authored chapters.Cybersecurity attacks threaten the lives and safety of individuals around the world. Improving defense mechanisms across all vulnerable surfaces is essential. Among surfaces, the software supply chain (SSC) stands out as particularly vulnerable to cyber threats. This thesis investigates how Software Bill of Materials (SBOM) can be utilized to assess and improve the security of software supply chains. An informal literature review reveals the paucity of studies utilizing SBOM to assess SSC security, which further motivates this research. Our research adopts the Goal/Question/Metric paradigm with two goals: firstly, to utilize SBOM technology to assess SSC security; secondly, to examine the impact of SBOM generation on vulnerability detection. The study unfolds in two phases. Initially, we introduce a novel approach to assess SSC security risks using SBOM technology. Utilizing analysis tools Trivy and Grype, we identify vulnerabilities across a corpus of 1,151 SBOMs. The second phase investigates how SBOM generation affects vulnerability detection. We analyzed four SBOM corpora derived from 2,313 Docker images by varying the SBOM generation tools (Syft and Trivy) and formats (CycloneDX 1.5 and SPDX 2.3). Using SBOM analysis tools (Trivy, Grype, CVE-bin-tool), we investigated how the vulnerability findings for the same software artifact changed according to the SBOM generation tool and format. The first phase demonstrates SBOMs use in identifying SSC vulnerabilities, showcasing their utility in enhancing security postures. The subsequent analysis reveals significant discrepancies in vulnerability detection outcomes, influenced by SBOM generation tools and formats. These variations underscore the necessity for rigorous validation and enhancement of SBOM technologies to secure SSCs effectively. This thesis demonstrates the use of SBOMs in assessing the security of SSCs. We underscore the need for stringent standards and rigorous validation mechanisms to ensure the accuracy and reliability of SBOM data. We reveal how SBOM generation affects vulnerability detection, offering insights that enhanced SBOM methodologies can help improve security. While SBOM is promising for enhancing SSC security, it is clear the SBOM space is immature. Extensive development, validation, and verification of analysis tools, generation tools, and formats are required to improve the usefulness of SBOMs for SSC security.Item Enabling real-time communications in resource-constrained networks(Montana State University - Bozeman, College of Engineering, 2023) Mekiker, Batuhan; Co-chairs, Graduate Committee: Clemente Izurieta and Mike WittieThe Internet of Things (IoT) applications require flexible and high-performance data channels, but many IoT networks can only support single-use case applications, which limits their performance and flexibility for real-time and streaming applications. LoRa offers a flexible physical network layer but lacks the resource management needed in its link layer protocols to support real-time flows. My initial contribution, the Beartooth Relay Protocol (BRP), expands the performance envelope of LoRa, making it suitable for a wide range of IoT applications, including those requiring real-time and streaming capabilities, and aims to address the problem. However, the resource-limited nature of LoRa does not allow BRP to scale to multi-hop mesh network deployments while maintaining real-time streams. To address the limitations of BRP in supporting mesh network deployments and real-time streams beyond two hops, we focus on developing the second-generation Beartooth Radios, MKII, and the first-generation Beartooth Gateways. We utilize Commercially-available Of the Shelf Components (COTS) in the radios to provide a cost-effective, power-efficient, and compact solution for establishing real-time situational awareness. The self-healing mesh network provided with MKII and Gateways also enhances the reliability of the overall network, ensuring connectivity even in case of node failures. By incorporating military information brokers, such as the Tactical Assault Kit (TAK), the Beartooth Gateway establishes a hybrid network between Beartooth radios, gateways, and other TAK-capable devices, ensuring compatibility with existing IP networks. Building upon the premise that voice communications are an integral part of real-time SA, the last part of my research focuses on assessing audio quality and efficacy of audio codecs within bandwidth-constrained networks. Delving into voice communications in resource-constrained networks, my research contrasts the performance of Text-to-Speech (TTS) models with traditional audio codecs. I demonstrate that TTS models outperform audio codec compressed voice samples in quality while also effectively managing scarce resources and available capacity more efficiently. By combining flexible link layer protocol elements in BRP, Beartooth MKII radios, Gateways, and insights on integrating TTS systems for voice communication, my research demonstrates a versatile and flexible solution that provides real-time application streams and critical situational awareness capabilities in bandwidth-constrained networks and mission-critical applications.Item An evaluation of graph representation of programs for malware detection and categorization using graph-based machine learning methods(Montana State University - Bozeman, College of Engineering, 2023) Pearsall, Reese Andersen; Chairperson, Graduate Committee: Clemente IzurietaWith both new and reused malware being used in cyberattacks everyday, there is a dire need for the ability to detect and categorize malware before damage can be done. Previous research has shown that graph-based machine learning algorithms can learn on graph representations of programs, such as a control flow graph, to better distinguish between malicious and benign programs, and detect malware. With many types of graph representations of programs, there has not been a comparison between these different graphs to see if one performs better than the rest. This thesis provides a comparison between different graph representations of programs for both malware detection and categorization using graph-based machine learning methods. Four different graphs are evaluated: control flow graph generated via disassembly, control flow graph generated via symbolic execution, function call graph, and data dependency graph. This thesis also describes a pipeline for creating a classifier for malware detection and categorization. Graphs are generated using the binary analysis tool angr, and their embeddings are calculated using the Graph2Vec graph embedding algorithm. The embeddings are plotted and clustered using K-means. A classifier is then built by assigning labels to clusters and the points within each cluster. We collected 2500 malicious executables and 2500 benign executables, and each of the four graph types is generated for each executable. Each is plugged into their own individual pipeline. A classifier for each of the four graph types is built, and classification metrics (e.g. F1 score) are calculated. The results show that control flow graphs generated from symbolic execution had the highest F1 score of the four different graph representations. Using the control flow graph generated from symbolic execution pipeline, the classifier was able to most accurately categorize trojan malware.Item Robust compression and classification of hyperspectral images(Montana State University - Bozeman, College of Engineering, 2022) Webster, Kyle Logan; Chairperson, Graduate Committee: John SheppardHyperspectral images are a powerful source of spectral data that have been utilized in a wide array of applications. The large size of hyperspectral images limits the applicable uses and necessitates effective compression methods. While many spectral-spatial compressors have been proposed in the past, there has been little work on the benefits of a spectral-only strategy. A spectral-only strategy not only has compressive capabilities but would also allow the classification of the compressed images, making the contributions of this thesis multi-fold. We present a Long Short Term Memory Autoencoder designed for the spectral compression of hyperspectral images. We show that this network can compress the images effectively with low reconstruction error, as well as require fewer training parameters to compress when compared to existing spectral-spatial compression methods. Existing learned compression models often require many of the pixels to be used to train an image. We demonstrate that our proposed network does not suffer a reduction in compression performance by reducing the number of training examples. Existing compression techniques are limited in capability by their inclusion of spatial information, requiring reprocessing for all images that have sufficiently different scenes. We demonstrate the proposed network's robustness by training a single model for use in multiple scenes without the requirement of retraining the model from scene to scene. Furthermore, using the feature extracting capabilities of an autoencoder, we analyzed the capabilities of the compressed image as a feature set for classification. Experimental results demonstrate that the unsupervised compressed features generated can be utilized for supervised machine learning classification tasks. We also demonstrate that the robustness of the compressor allowed for a single network to not require being retrained for compressing and then classifying new images without significant loss.Item Towards responsive user services in edge computing(Montana State University - Bozeman, College of Engineering, 2023) Rahman, Saidur; Co-chairs, Graduate Committee: Mike Wittie and Sean Yaw; This is a manuscript style paper that includes co-authored chapters.Mobile applications can improve battery and application performance by offloading heavy processing tasks to more powerful compute servers. Cloud servers are located far from mobile devices that may not meet the responsiveness requirements of those applications. Edge servers deployed at the edge of the network to provide the compute resources to achieve low latency. So the combination of 5G and edge computing has the potential to offer low latency user services that can make the mobile applications responsive. 5G provides fast communication between users and servers, however, additional communication delays can occur because of increasing number of roundtrip communication to locate the servers using domain name system (DNS). So, I propose a caching mechanism to reduce the DNS roundtrip delay. Furthermore, the edge server and cellular tower use the same compute resources, which are limited. It is not clear how to place the tasks on the limited edge resources and how to handle the resource sharing when Radio Access Network (RAN) process needs more computation resources to handle network traffic fluctuations. So, I present several techniques to implement task checkpointing, task checkpointing overhead prediction, and task migration to provide low latency and responsive services to mobile applications. I also show how the proposed techniques can manage the shared resources between mobile network and edge servers, utilize the available edge resource effectively and increase users' quality of experience.Item A comparative analysis of Ethereum gas price oracles' performance and a mechanism for Ethereum gas price prediction post-EIP-1559(Montana State University - Bozeman, College of Engineering, 2022) Barada, Ibrahim; Chairperson, Graduate Committee: Laura StanleyBlockchain transactions compete for limited space in blockchain blocks. Miners prefer to include transactions with higher fees into new blocks. Ethereum released EIP-1559 as an upgrade for its transaction pricing mechanism. The improvement proposal aims to stabilize the transaction pricing mechanism and improve the predictability of gas prices. In the context of Ethereum, gas price oracles predict fees such that transactions submitted at those fees make it into a block within a target delay. In practice, however, Ethereum gas price oracles are inaccurate, which makes it difficult for distributed applications to operate predictable services in terms of price and performance. To understand and measure oracle accuracy we define new gas prediction performance metrics. We demonstrate that oracles underprice transactions, causing them to miss the delay target. We also show that oracles overprice transactions, causing them to meet the delay target, but at a higher-than-necessary cost. As a result of oracles inaccuracies, users tend to either wait longer or pay more than sufficient gas prices for a transaction to get into a block. We provide a comparative analysis of five gas price oracles pre and post-the release of EIP-1559 showing their performance in terms of accuracy of acceptance, underpricing, and overpricing. We also discuss the factors that influence oracle accuracy and the effects of those inaccuracies in terms of time and money wasted. We apply our predefined metrics to study the performance of oracles pre and post-EIP-1559. We observe that EIP-1559 improved the transaction acceptance rate and shortened acceptance delays. On the other hand, we observe that EIP-1559 increased transaction overpricing. The current gas price prediction mechanisms required further investigation after the release of EIP-1559. Hence, we devised a new mechanism to predict gas prices of EIP-1559-compatible transactions on the Ethereum blockchain. The mechanism allows users to calculate gas prices based on the current block utilization and base fee. We measured the probability of acceptance, time wasted, and money wasted and noticed an increase in the probability of acceptance and a decrease in both time and money wasted in comparison to the currently existing oracles.Item A framework to assess bug-bounty platforms based on potential attack vectors(Montana State University - Bozeman, College of Engineering, 2022) McCartney, Susan Ann; Co-chairs, Graduate Committee: Clemente Izurieta and Mike WittieCorporate computer security is becoming increasingly important because the frequency and severity of cyberattacks on businesses is high and increasing. One way to improve the security of company software is for a company to hire a third party to identify and report vulnerabilities, blocks of code that can be exploited. A bug-bounty program incentivizes ethical hackers (herein, 'researchers') to find and fix vulnerabilities before they can be exploited. For this reason, bug-bounty programs have been increasing in popularity since their inception a decade ago. However, the increase in their use and popularity also increases the likelihood of the companies being targeted by malicious actors by using a bug-bounty programs as the medium. The literature review and investigation into the rules and requirements for bug-bounty platform revealed that though the bug-bounty programs can improve a vendor's security, the programs still contain a serious security flaw. The platforms are not required to scan reports for malware and there is no guidance requesting the vendors scan for malware. This means it is possible to perform a cyberattack using malware as a report attachment. Through data collection from 22 platforms, an observational case study, and analysis of different malware, I have created a tool to assist vendors in selecting the platform of best fit and characterize the possible attack surfaces presented from the file options allowed on the platform. The outcome from this research is evidence of the importance of understanding the malware files used as report attachments. However, more research is needed in the relationship between file extensions and malware in order to thoroughly comprehend the attack surface capabilities, and to understand the trade-offs between security and convenience.