Scholarship & Research

Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/1

Browse

Search Results

Now showing 1 - 10 of 12
  • Thumbnail Image
    Item
    Informing the construction of narrative-based risk communication
    (Montana State University - Bozeman, College of Engineering, 2019) King, Henry William; Chairperson, Graduate Committee: Clemente Izurieta
    The current communication of flood risk by government agencies and the scientific community to the citizens living in the floodplain is ineffective. Using the Narrative Policy Framework (NPF), this communication can be enhanced through the use of Hero, Victim, and Victim to Hero character-based narratives. This thesis describes the methods used to inform users of the NPF to construct and test narratives using computational methods. Four natural language processing tasks are described; topic modeling, sentiment analysis, classification, and term frequencies. It was found that using the difference of transformed relative term frequencies produced an adequate vocabulary for each style of narrative. The narratives constructed from these vocabularies were used in work that sought to formalize the narrative construction process and in focus group studies which found that narrative-based scientific messages increased affective response versus traditional scientific messaging.
  • Thumbnail Image
    Item
    The identification, categorization, and evaluation of model-based behavioral decay in design patterns
    (Montana State University - Bozeman, College of Engineering, 2019) Reimanis, Derek Kristaps; Chairperson, Graduate Committee: Clemente Izurieta; Clemente Izurieta was a co-author of the article, 'Evaluations of behavioral technical debt in design patterns: a multiple longitudinal case study' submitted to the journal 'IEEE transactions on software engineering' which is contained within this thesis.
    Software quality assurance (QA) techniques seek to provide software developers and managers with the methods and tools necessary to monitor their software product to encourage fast, on-time, and bug-free releases for their clients. Ideally, QA methods and tools provide significant value and highly-specialized results to product stakeholders, while being fully incorporated into an organization's process and with actionable and easy-to-interpret outcomes. However, modern QA techniques fall short on these goals because they only feature structural analysis techniques, which do not fully illuminate all intricacies of a software product. Additionally, many modern QA methods are not capable of capturing domain-specific concerns, which suggests their results are not fulfilling their potential. To assist in the remediation of these issues, we have performed a comprehensive study to explore an unexplored phenomenon in the field of QA, namely model-based behavioral analysis. In this sense, behavioral analysis refers to the mechanisms that occur in a software product as the product is executing its code, at system run-time. We approach this problem from a model-based perspective because models are not tied to program-specific behaviors, so findings are more generalizable. Our procedure follows an intuitive process, involving first the identification of model-based behavioral issues, then the classification and categorization of these behavioral issues into a taxonomy, and finally the evaluation of them in terms of their effect on software quality. Our results include a taxonomy that captures and provides classifications for known model-based behavioral issues. We identified relationships between behavioral issues and existing structural issues to illustrate that the inclusion of behavioral analysis provides a new perspective into the inner mechanisms of software systems. We extended an existing state-of-the-art operational software quality measurement technique to incorporate these newfound behavioral issues. Finally, we used this quality extension to evaluate the effects of behavioral issues on system quality, and found that software quality has a strong inverse relationship with behavioral issues.
  • Thumbnail Image
    Item
    Exploring the feasibility of an automated biocuration pipeline for research domain criteria
    (Montana State University - Bozeman, College of Engineering, 2019) Anani, Mohammad; Chairperson, Graduate Committee: Indika Kahanda
    Research on mental disorders has been largely based on manuals such as the ICD-10 (International Classification of Diseases) and DSM-V (the Diagnostic Statistical Manual of Mental Disorders), which rely on the signs and symptoms of disorders for classification. However, this approach tends to overlook the underlying mechanisms of brain disorders and does not express the heterogeneity of those conditions. Thus, the National Institute of Mental Health (NIMH) introduced a new framework for mental illness research, namely, Research Domain Criteria (RDoC). RDoC is a research framework which utilizes various units of analysis from genetics, neural circuits, etc., for accurate multi-dimensional classification of mental illnesses. The RDoC framework is manually updated with units of analysis in periodic workshops. The process of updating the RDoC framework is accomplished by researching relevant evidence in the literature by domain experts. Due to the large amount of relevant biomedical research available, developing a method to automate the process of extracting evidence from the biomedical literature to assist with the curation of the RDoC matrix is key. In this thesis, we formulate three tasks that would be necessary for an automated biocuration pipeline for RDoC: 1) Labeling biomedical articles with RDoC constructs, 2) Retrieval of brain research articles, and 3) Extraction of relevant data from these articles. We model the first problem as a multilabel classification problem with 26 constructs of RDoC and use a gold-standard dataset of annotated PubMed abstracts and employ various supervised classification algorithms. The second task classifies general PubMed abstracts relevant to brain research using the same data from the first task and other unlabeled abstracts for training a model. Finally, for the third task, we attempt to extract Problem, Intervention, Comparison, and Outcomes (PICO) elements and brain region mentions from a subset of the RDoC abstracts. To the best of our knowledge, this is the first study aimed at automated data extraction and retrieval of RDoC related literature. The results of automating the aforementioned tasks are promising; we have a very accurate multilabel classification model, a good retrieval model, and an accurate brain region extraction model.
  • Thumbnail Image
    Item
    Mitigating software engineering costs in distributed ledger technologies
    (Montana State University - Bozeman, College of Engineering, 2018) Heinecke, Jonathan Taylor; Chairperson, Graduate Committee: Mike Wittie
    Distributed ledger technologies (DLTs) are currently dominating the field of distributed systems research and development. The Ethereum blockchain is emerging as a popular DLT platform for developing software and applications. Several challenges in Ethereum software development are the complex nature of working with DLTs, the lack of tools for developing on this DLT, and poor documentation of concepts for DLT developers. In this thesis, we provide building blocks that reduce the complexity of DLT operations and lower the barrier to entry into DLT development. We do this by providing a Node.js library, Ethereum-Easy, that simplifies operations on Ethereum. We implement this library into a sample application called Rock, Paper, Scissors (RPS) and built a continuous delivery, continuous integration pipeline for deploying Ethereum code (Jenk-Thereum). This thesis aims to make development on DLTs easier, quicker, and less expensive.
  • Thumbnail Image
    Item
    Computational investigation on protein sequencing and genome rearrangement problems
    (Montana State University - Bozeman, College of Engineering, 2018) Qingge, Letu; Chairperson, Graduate Committee: Binhai Zhu
    De novo protein sequencing and genome rearrangement problems are the classical problems in bioinformatics. De novo protein sequencing problem try to determine the whole sequence of amino acids based on the mass spectrometry data without using the database search. Genome rearrangement problems try to recognize the evolutionary process between two species. In this dissertation, first, we describe the process of constructing target protein sequences by utilizing mass spectrometry based data from both top-down and bottom-up tandem mass spectra. In addition to using data from mass spectrometry analysis, we also utilize techniques for de novo protein sequencing using a homologous protein sequence as a reference to attempt to fill in any remaining gaps in the constructed protein scaffold. Initial results for analysis on real datasets yield over 96-100% coverage and 73-91% accuracy with the target protein sequence. Second, we use different genome rearrangement operations to transform one genome to another such that the similarity between two genomes is maximized. We explore these problems in terms of theoretical and experimental analysis. For sorting unsigned genome problem by double cut and join (DCJ) operation, we design a randomized fixed parameter tractable (FPT) approximation algorithm for computing the DCJ distance with an approximation factor 4/3 + Epsilon, and the running time O*(2 d*), where d* represents the optimal DCJ distance. For one-sided exemplar adjacency number problem, we reformulate the problem as maximum independent set in a colored interval graph and hence reduce the appearance of each gene at most twice. Moreover, we design a factor-2 approximation and also show that the approximation factor can not be improved less than 2 by some local search technique. At last, we apply integer linear programming to solve the reduced instance exactly. For the minimum copy number generation problem, we analyze the complexity of different variations of this problem and show a practical algorithm for the general case based on greedy method.
  • Thumbnail Image
    Item
    Torchbearer: a multi-pipeline approach to landmark-based navigation
    (Montana State University - Bozeman, College of Engineering, 2018) Vollmer, Fredric Muller; Chairperson, Graduate Committee: Mike Wittie
    The task of navigation adds cognitive distraction to the already demanding task of driving. Most popular navigation aids provide verbal directions based solely on distances and street names, but the inclusion of landmark descriptions in these instructions can improve navigation performance, decrease unsafe driving behaviors and reduce cognitive load. Current approaches to selecting landmarks and building landmark-based instructions rely on a single source of data, thereby limiting the set of potential landmarks, or use a single factor in choosing the best landmark, failing to account for all characteristics that make a landmark suitable for navigation. We develop a multi-pipeline system that leverages both human (crowd-sourced) input and machine-based approaches to find, describe and choose the best landmark. Additionally, we develop a mobile application for the delivery of navigation instructions based on landmarks. We evaluate the cost and performance differences between these pipelines, as well as study the effect of landmark navigation prompts on cognitive load, safe driving behavior and driver satisfaction via an in situ experiment.
  • Thumbnail Image
    Item
    Design and implementation of a real-time system to characterize functional connectivity between cortical areas
    (Montana State University - Bozeman, College of Engineering, 2017) Parsa Gharamaleki, Mohammadbagher; Chairperson, Graduate Committee: Brendan Mumey
    Despite a thorough mapping of the anatomical connectivity between brain regions and decades of neurophysiological studies of neuronal activity within the various areas, our understanding of the nature of the neural signals sent from one area to another remains rudimentary. Orthodromic and antidromic activation of neurons via electrical stimulation ('collision testing') has been used in the peripheral nervous system and in subcortical structures to identify signals propagating along specific neural pathways. However, low yield makes this method prohibitively slow for characterizing cortico-cortical connections. We employed recent advances in electrophysiological methods to improve the efficiency of the collision technique between cortical areas. There are three key challenges: 1) maintaining neuronal isolations following stimulation, 2) increasing the number of neurons being screened, and 3) ensuring low-latency triggering of stimulation after spontaneous action potentials. We have developed a software-hardware solution for online isolations and stimulation triggering, which operates in conjunction with two hardware options, Hardware Processing Platform (HPP) or a Software Processing Platform (SPP). The HPP is a 'system on a chip' solution enabling real-time processing in a re-programmable hardware platform, whereas the SPP is a small Intel Atom processor that allows soft real-time computing on a CPU. Employing these solutions for template matching both accelerates spike sorting and provides the low-latency triggering of stimulation required to produce collision trials. Recording with a linear tetrode array electrode allows simultaneous screening of multiple neurons, while the software package coordinates efficient collision testing of multiple user-selected units across channels. This real-time connectivity screening system enables researchers working with a variety of animal models and brain regions to identify the functional properties of specific projections between cortical areas in behaving animals.
  • Thumbnail Image
    Item
    Computational pan-genomics: algorithms and applications
    (Montana State University - Bozeman, College of Engineering, 2018) Cleary, Alan Michael; Chairperson, Graduate Committee: Brendan Mumey
    As the cost of sequencing DNA continues to drop, the number of sequenced genomes rapidly grows. In the recent past, the cost dropped so low that it is no longer prohibitively expensive to sequence multiple genomes for the same species. This has led to a shift from the single reference genome per species paradigm to the more comprehensive pan-genomics approach, where populations of genomes from one or more species are analyzed together. The total genomic content of a population is vast, requiring algorithms for analysis that are more sophisticated and scalable than existing methods. In this dissertation, we explore new algorithms and their applications to pan-genome analysis, both at the nucleotide and genic resolutions. Specifically, we present the Approximate Frequent Subpaths and Frequented Regions problems as a means of mining syntenic blocks from pan-genomic de Bruijn graphs and provide efficient algorithms for mining these structures. We then explore a variety of analyses that mining synteny blocks from pan-genomic data enables, including meaningful visualization, genome classification, and multidimensional-scaling. We also present a novel interactive data mining tool for pan-genome analysis -- the Genome Context Viewer -- which allows users to explore pan-genomic data distributed across a heterogeneous set of data providers by using gene family annotations as a unit of search and comparison. Using this approach, the tool is able to perform traditionally cumbersome analyses on-demand in a federated manner.
  • Thumbnail Image
    Item
    Exploring timeliness for accurate location recommendation on location-based social networks
    (Montana State University - Bozeman, College of Engineering, 2017) Xu, Yi; Chairperson, Graduate Committee: Qing Yang
    An individual's location history in the real world implies his or her interests and behaviors. Accordingly, people who share similar location histories are likely to have common interest and behavior. This thesis analyzes and understands the process of Collaborative Filtering (CF) approach, which mines an individual's preference from his/her geographic location histories and recommends locations based on the similarities between the user and others. We find that a CF-based recommendation process can be summarized as a sequence of multiplications between a transition matrix and visited-location matrix. The transition matrix is usually approximated by the user's interest matrix that reflect the similarity among users, regarding to their interest in visiting different locations. The visited-location matrix provides the history of visited locations of all users, which is currently available to the recommendation system. We find that recommendation results will converge if and only if the transition matrix remains unchanged; otherwise, the recommendations will be valid for only a certain period of time. Based on our analysis, a novel location-based accurate recommendation (LAR) method is proposed, which considers the semantic meaning and category information of locations, as well as the timeliness of recommending results, to make accurate recommendations. We evaluated the precision and recall rates of LAR, using a large-scale real-world data set collected from Brightkite. Evaluation results confirm that LAR offers more accurate recommendations, comparing to the state-of-art approaches.
  • Thumbnail Image
    Item
    Metamorphic relations ranking for reducing testing cost in scientific software
    (Montana State University - Bozeman, College of Engineering, 2017) Malallah, Safia Abdullhameed; Chairperson, Graduate Committee: Upulee Kanewala
    Lack of automated test oracles is a major challenge faced when testing scientific software. An oracle is a mechanism determine whether test results are correct according to the expected behavior of the program. Metamorphic Testing (MT) is a testing technique that can be used to test such applications. This approach checks relations among multiple inputs and outputs of the program instead of checking the correctness of individual test outputs. Theses relationships are called Metamorphic Relations (MRs) and their violations indicates faults in System Under Test (SUT). Programs have several MRs with different fault detection effectiveness. Thus order in which they apply determines the efficiency of the testing process. Therefore in this work we propose a strategy to prioritize MRs based on their potential fault finding ability. Our strategy uses mutation testing to create a prioratized order of MRs for a given program. We evaluated our proposed approach using machine learning libraries in Weka as well as open source mathematical programs; these results show that our strategy is effective in developing a prioratized order of MRs that maximizes early fault detection. Our results show that in 126 methods we can detect 50.1%-100% faults using 25% of the MRs compared to a random order only detect 1.13%-100%.
Copyright (c) 2002-2022, LYRASIS. All rights reserved.