Theses and Dissertations at Montana State University (MSU)
Permanent URI for this collectionhttps://scholarworks.montana.edu/handle/1/733
Browse
31 results
Filters
Settings
Search Results
Item Risk mitigation focused on surgical care using process improvement methodologies in rural health systems(Montana State University - Bozeman, College of Engineering, 2023) Sitar, Nejc; Chairperson, Graduate Committee: Bernadette J. McCrory; This is a manuscript style paper that includes co-authored chapters.Rural healthcare is represented by approximately one-third of community hospitals in the United States primarily in the Midwest and Western United States. Due to the lack of resources and the demographic characteristics of rural populations, rural community hospitals are under constant pressure to meet Center for Medicare & Medicaid Services (CMS) quality requirements. Meeting CMS quality requirements is particularly challenging in surgical care, due to the lower volumes and research opportunities, in addition to a shortage of qualified surgical specialists. The perioperative surgical home (PSH) model was established as a health management concept in a rural community hospital located in the Northwest of the United States to improve the quality of care by providing a longitudinal approach to patient treatment. The main opportunities for PSH improvement were identified in the "decision for surgery," "preoperative," and "postoperative" stages of the PSH model. To improve PSH clinic performance this thesis proposes an improved National Surgical Quality Improvement Program (NSQIP) calculator User Interface (UI), as well as a new prediction model for predicting total joint arthroplasty (TJA) Length of Stay (LOS). The improved layout of the NSQIP calculator was developed based on two approved surveys by card sorting and Borda count methodology, while the new prediction model for predicting TJA patients' LOS was based on the Decision Tree (DT) machine learning model. A usability study of the NSQIP calculator UI identified opportunities for future improvements, such as the reorganized layout of postoperative complications and the addition of a supporting tool that would clearly define postoperative complications. The new DT prediction model outperformed a currently used NSQIP calculator in the prediction accuracy of TJA LOS, as it resulted in lower Root-mean-Square-Error values. Furthermore, the structure of the DT model allowed better interpretability of the decision-making process compared to the NSQIP calculator, which increased the trust and reliability of the calculated prediction. Despite some limitations such as a small sample size, this study provided valuable information for future improvements in rural healthcare, that would enable Rural Community Hospitals to better predict future outcomes and meet the strict CMS quality standard.Item Data-driven approaches for distribution grid modernization: exploring state estimaion, pseudo-measurement generation and false data detection(Montana State University - Bozeman, College of Engineering, 2023) Radhoush, Sepideh; Chairperson, Graduate Committee: Brad WhitakerDistribution networks must be regularly updated to enhance their performance and meet customer electricity requirements. Advanced technologies and infrastructure--including two- way communication, smart measuring devices, distributed generations in various forms, electric vehicles, variable loads, etc.--have been added to improve the overall efficiency of distribution networks. Corresponding to these new features and structures, the continuous control and monitoring of distribution networks should be intensified to keep track of any modifications to the distribution network performance. Distribution system state estimation has been introduced for real-time monitoring of distribution networks. State estimation calculations are highly dependent on measurement data which are collected from measurement devices in distribution networks. However, the installation of measurement devices is not possible at all buses to ensure the distribution network is fully observable. To address the lack of real measurements, pseudo- measurements are produced from historical load and generation data. Available measurements, along with physical distribution network topology, are fed into a state estimation algorithm to determine system state variables. Then, state estimation results are sent to a control center for further processing to enhance distribution network operation. However, the accuracy of state estimation results could be degraded by false data injection attacks on measurement data. If these attacks are not detected, distribution network operation could be significantly influenced. Different methods have been developed to enhance a distribution network operation and management. Machine learning approaches have also been identified to be beneficial in solving different types of problems in a power grid. In this dissertation, machine learning is applied to three areas of distribution systems: generating pseudo-measurements, performing distribution system state estimation calculations, and detecting false data injection attacks on measurement data. In addition to addressing these areas individually, machine learning is used to simultaneously perform distribution system state estimation calculation and false data injection attack detection. This is done by taking advantage of conventional and smart measurement data at different time scales. The results reveal that the operation and performance of a distribution network are improved using machine learning algorithms, leading to more effective power grid modernization.Item Automatic 2D material detection and quantum emission prediction using deep learning-based models(Montana State University - Bozeman, College of Engineering, 2023) Ramezani, Fereshteh; Chairperson, Graduate Committee: Brad WhitakerThe realm of quantum engineering holds immense promise for revolutionizing technological landscapes, particularly with the advent of 2D materials in quantum device applications. The fundamental properties of these materials make them pivotal in various quantum applications. However, the progress in quantum engineering faces significant roadblocks, primarily centered around two challenges: accurate 2D material detection and understanding the random nature of quantum fluctuations. In response to the first challenge, I have successfully implemented a new deep learning pipeline to identify 2D materials in microscopic images. I have used a state-of-the-art two-stage object detector and trained it on images containing flakes of varying thickness of hexagonal boron nitride (hBN, a 2D material). The trained model achieved a high detection accuracy for the rare category of thin flakes (< or = 50 atomic layers thick). My further analysis shows that this proposed pipeline is robust against changes in color or substrate background, and could be generalized to various microscope settings. As an achievement, I have integrated my proposed method to the 2D quantum material pipeline (2D-QMaP), that has been under development by the MonArk Quantum Foundry, to provide automated capabilities that unite and accelerate the primary stages of sample preparation and device fabrication for 2D quantum materials research. My proposed algorithm has given the 2D-QMaP fully automatic real-time 2D flake detection capabilities, which has never been done effectively before. To address the second challenge, I assessed the random nature of quantum fluctuations, and I developed time series forecasting deep learning models to analyze and predict quantum emission fluctuations for the first time. My trained models can roughly follow the actual trend of the data and, under certain data processing conditions, can predict peaks and dips of the fluctuations. The ability to anticipate these fluctuations will allow physicists to harness quantum fluctuation characteristics to develop novel scientific advances in quantum computing that will greatly benefit quantum technologies. The automated 2D material identification, addressing the laborious process of flake detection, and the introduction of innovative quantum fluctuations analysis with predictive capabilities not only streamline research processes but also hold the promise of creating more stable and dependable quantum emission devices, thus significantly advancing the broader field of quantum engineering.Item An evaluation of graph representation of programs for malware detection and categorization using graph-based machine learning methods(Montana State University - Bozeman, College of Engineering, 2023) Pearsall, Reese Andersen; Chairperson, Graduate Committee: Clemente IzurietaWith both new and reused malware being used in cyberattacks everyday, there is a dire need for the ability to detect and categorize malware before damage can be done. Previous research has shown that graph-based machine learning algorithms can learn on graph representations of programs, such as a control flow graph, to better distinguish between malicious and benign programs, and detect malware. With many types of graph representations of programs, there has not been a comparison between these different graphs to see if one performs better than the rest. This thesis provides a comparison between different graph representations of programs for both malware detection and categorization using graph-based machine learning methods. Four different graphs are evaluated: control flow graph generated via disassembly, control flow graph generated via symbolic execution, function call graph, and data dependency graph. This thesis also describes a pipeline for creating a classifier for malware detection and categorization. Graphs are generated using the binary analysis tool angr, and their embeddings are calculated using the Graph2Vec graph embedding algorithm. The embeddings are plotted and clustered using K-means. A classifier is then built by assigning labels to clusters and the points within each cluster. We collected 2500 malicious executables and 2500 benign executables, and each of the four graph types is generated for each executable. Each is plugged into their own individual pipeline. A classifier for each of the four graph types is built, and classification metrics (e.g. F1 score) are calculated. The results show that control flow graphs generated from symbolic execution had the highest F1 score of the four different graph representations. Using the control flow graph generated from symbolic execution pipeline, the classifier was able to most accurately categorize trojan malware.Item Healthcare analytics at a perioperative surgical home implemented community hospital(Montana State University - Bozeman, College of Engineering, 2022) Sridhar, Srinivasan; Chairperson, Graduate Committee: Bernadette J. McCrory; This is a manuscript style paper that includes co-authored chapters.The Perioperative Surgical Home (PSH) is a novel patient-centric surgical system developed by American Society of Anesthesiologists (ASA) to improve surgical outcomes and patient satisfaction. Compared to a traditional surgical system, the PSH is a coordinated interdisciplinary team encompassing all surgical care provided to patients from the perioperative phase to recovery phase. However, limited research has been performed in augmenting the PSH surgical care using healthcare analytics. In addition, the spread of the PSH is limited in rural hospitals. Compared to urban hospitals, rural hospitals have higher surgical care inequality due to limited availability of clinicians, resources, resulting in poor access to surgical care. With an increase in the rate of Total Joint Replacement (TJR) procedures in the United States (US), rural hospitals are often under-resourced with coordinating perioperative services resulting in inadequate communication, poor care continuity, and preventable complications. This study focused on developing a novel analytical framework to predict, evaluate, and improve TJR outcomes at a PSH implemented rural community hospital. The study was segmented into three parts where the first part explored the effectiveness of the digital engagement platform to longitudinally engage with TJR patients located in rural areas. The second part evaluated the impact of PSH system in the rural setting by analyzing and comparing the TJR surgical outcomes. Finally, the third part explained the importance of machine learning in the rural PSH system to identify critical patient factors, enhance decision-making, and plan for preventive interventions for better surgical outcomes. Results from this research demonstrated the importance of healthcare analytics in PSH system and how it can help to enhance TJR surgical outcomes and experience for both clinicians and patients.Item Improving the confidence of machine learning models through improved software testing approaches(Montana State University - Bozeman, College of Engineering, 2022) ur Rehman, Faqeer; Chairperson, Graduate Committee: Clemente Izurieta; This is a manuscript style paper that includes co-authored chapters.Machine learning is gaining popularity in transforming and improving a number of different domains e.g., self-driving cars, natural language processing, healthcare, manufacturing, retail, banking, and cybersecurity. However, knowing the fact that machine learning algorithms are computationally complex, it becomes a challenging task to verify their correctness when either the oracle is not available or is available but too expensive to apply. Software Engineering for Machine Learning (SE4ML) is an emerging research area that focuses on applying the SE best practices and methods for better development, testing, operation, and maintenance of ML models. The focus of this work is on the testing aspect of ML applications by adapting the traditional software testing approaches for improving the confidence in them. First, a statistical metamorphic testing technique is proposed to test Neural Network (NN)-based classifiers in a non-deterministic environment. Furthermore, an MRs minimization algorithm is proposed for the program under test; thus, saving computational costs and organizational testing resources. Second, a Metamorphic Relation (MR) is proposed to address a data generation/labeling problem; that is, enhancing the test inputs effectiveness by extending the prioritized test set with new tests without incurring additional labeling costs. Further, the prioritized test inputs are leveraged to propose a statistical hypothesis testing (for detection) and machine learning-based approach (for prediction) of faulty behavior in two other machine learning classifiers i.e., NN-based Intrusion Detection Systems. Finally, to test unsupervised ML models, the metamorphic testing approach is utilized to make some insightful contributions that include: i) proposing a broader set of 22 MRs for assessing the behavior of clustering algorithms under test, ii) providing a detailed analysis/reasoning to show how the proposed MRs can be used to target both the verification and validation aspects of testing the programs under investigation, and iii) showing that verification of MR using multiple criteria is more beneficial than relying on using just a single criterion (i.e., clusters assigned). Thus, the work presented here results in providing a significant contribution to address the gaps found in the field, which enhances the body of knowledge in the emergent SE4ML field.Item Automated techniques for prioritization of metamorphic relations for effective metamorphic testing(Montana State University - Bozeman, College of Engineering, 2022) Srinivasan, Madhusudan; Chairperson, Graduate Committee: John Paxton and Upulee Kanewala (co-chair)An oracle is a mechanism to decide whether the outputs of the program for the executed test cases are correct. In many situations, the oracle is not available or too difficult to implement. Metamorphic testing is a testing approach that uses metamorphic relations (MRs), properties of the software under test represented in the form of relations among inputs and outputs of multiple executions, to help verify the correctness of a program. Typically, MRs vary in their ability to detect faults in the program under test, and some MRs tend to detect the same set of faults. In this work, we aim to prioritize MRs to improve the efficiency and effectiveness of MT. We present five MR prioritization approaches: (1) Fault-based, (2) Coverage-based, (3) Statement Centrality-based, (4) Variable-based, and (5) Data Diversity-based. To evaluate these MR prioritization approaches, we conducted experiments on complex open- source software systems and machine learning programs. Our results suggest that the proposed MR prioritization approaches outperform the current practice of executing the source and follow-up test cases of the MRs randomly. Further, our results show that Statement Centrality-based and Variable-based approaches outperform Code Coverage and random-based approaches. Also, the proposed approaches show 21% higher rate of fault detection over random-based prioritization. For machine learning programs, the proposed Data Diversity-based MR prioritization approach increases the fault detection effectiveness by up to 40% when compared to the Code Coverage- based approach and reduces the time taken to detect a fault by 29% when compared to random execution of MRs. Further, all the proposed approaches lead to reducing the number of MRs that needs to be executed. Overall, our work would result in saving time and cost during the metamorphic testing process.Item Machine learning for pangenomics(Montana State University - Bozeman, College of Engineering, 2021) Manuweera, Buwani Sakya; Chairperson, Graduate Committee: Brendan Mumey; This is a manuscript style paper that includes co-authored chapters.Finding genotype-phenotype associations is an important task in biology. Most of the the existing reference-based methods introduce biases because they use a single genome from an individual as the reference sequence. So, these biases can lead to limitations in inferred genotype-phenotype associations. Advances in sequencing techniques have enabled access to a large number of sequenced genomes from multiple organisms from different species. These can be used to create a pangenome, which represents a collection of genetic information from multiple organisms. Using a pangenome can effectively reduce those limitation issues as it does not require a reference. Recently, machine learning techniques are emerging as effective methods for problems involving genomics and pangenomics data. Kernel methods are used as a part of machine learning models to compute similarities between instances. Kernels can map the given set of data into a different feature space that can help distinguish the data into corresponding classes. In this work, we develop supervised machine learning models using a set of features gathered using pangenomic graphs, and the effectiveness of those features is evaluated in predicting yeast phenotypes. We first evaluated the effectiveness of the features using a a traditional supervised machine learning model and, then compared it to novel custom kernels that incorporate the information from the pangenomic graphical structure. Experimental results using yeast phenotypes indicate that the developed machine learning models that use reference-free features and novel kernels outperform models based on traditional reference-based features. This work has implications for bioinformaticians and computational biologists working with pangenomes as well as computer scientists developing predictive models for genomic data.Item Optimizing site-specific nitrogen fertilizer management based on maximized profit and minimized pollution(Montana State University - Bozeman, College of Agriculture, 2022) Hegedus, Paul Briggs; Chairperson, Graduate Committee: Bruce D. Maxwell and Stephanie A. Ewing (co-chair); This is a manuscript style paper that includes co-authored chapters.Application of nitrogen fertilizers beyond crop needs contributes to nitrate pollution and soil acidification. Excess nitrogen applications are most prevalent when synthetic fertilizers are applied at uniform rates across fields. Precision agroecology harnesses the tools and technology of variable rate precision agriculture, a common but underutilized management strategy, to make ecologically conscious decisions about field management that promote economic and environmental sustainability. On-farm precision experimentation provides the basis for making data driven ecological management decisions through the field-specific assessment of crop responses. This dissertation work used on-farm experimentation with variable nitrogen fertilizer rates, combined with intensive data collection and data science, to address the main objective of this dissertation: development and evaluation of optimized nitrogen fertilizer management on a subfield scale, based on maximization of farmer net-returns and nitrogen use efficiency. The response of winter wheat yield and grain protein concentration to rates of nitrogen fertilizer application varied among fields, and across time, which influenced the model form used to characterize the relationships of grain yield and quality to fertilizer within a field. Machine learning approaches, such as random forest regression, tended to provide the lowest degree of error when forecasting future crop responses. Machine learning also demonstrated its utility for use in agronomic applications, as a support vector regression model provided the most accurate predictions of nitrogen use efficiency on a subfield scale. Crop response and nitrogen use efficiency models were integrated into a decision-making framework for optimized site-specific based nitrogen fertilizer management based on between maximized profits and minimized potential of nitrogen loss. Simulations of optimized site-specific nitrogen fertilizer management compared to farmer's status quo management showed a 100% probability across all fields tested that that mean net-return from the site-specific approaches were more profitable than applications of farmer selected nitrogen fertilizer rates. However, even while considering minimization of the potential for nitrogen loss when identifying optimum nitrogen fertilizer rates, there was field specific variation in the probability that site-specific, compared to farmer selected, nitrogen fertilizer management reduced the total amount of nitrogen applied across a field.Item Automated clinical transcription for behavioral health clinicians(Montana State University - Bozeman, College of Engineering, 2022) Kazi, Nazmul Hasan; Chairperson, Graduate Committee: Brendan Mumey; This is a manuscript style paper that includes co-authored chapters.Mental health disorder is one of the most common but expensive healthcare conditions in the world. Yet, more than half of all patients go untreated due to various reasons such as lack of access to resources and clinicians. On the other hand, providers rely on Electronic Health Records (EHRs) to compile and share clinical notes, which is a key component of clinical practice, but time-consuming data entry is considered one of the primary downsides of EHRs. Many practitioners are spending more time in EHR documentation than direct patient care, which adds to patient dissatisfaction and clinician burnout. In this work, we explore the feasibility of developing an end-to-end clinical transcription tool that fully automates the documentation process for behavioral health clinicians. We divide the task into several sub-tasks and primarily focus on the following: 1) extraction and classification of important information from patient-provider conversations, and 2) generation of clinical notes from extracted information. We develop a dataset of 65 transcripts from simulated provider-patient conversations. Then, we fine-tune a transformer language model that shows promising results on personalized data extraction (F1=0.94) and scope for improvement in classification (F1=0.18) of extracted information to EHR categories. Furthermore, we develop a rule-based natural language generation module that formalizes all types of extracted information and synthesizes them into clinical notes. The overall pipeline shows the potential of automatically generating draft clinical notes and reducing the documentation time for behavioral health clinicians by 70-80%. The findings of this work have implications for health behavioral care providers as well as machine learning and natural language processing application developers.