Theses and Dissertations at Montana State University (MSU)
Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/732
Browse
4 results
Search Results
Item Large-scale automated human protein-phenotype relation extraction from biomedical literature(Montana State University - Bozeman, College of Engineering, 2020) Pourreza Shahri, Morteza; Chairperson, Graduate Committee: Indika KahandaIdentifying protein-phenotype relations is of paramount importance for applications such as uncovering rare and complex diseases. Human Phenotype Ontology (HPO) is a recently introduced standardized vocabulary for describing disease-related phenotypic abnormalities in humans. While the official HPO knowledge base maintains known associations between human proteins and HPO terms, it is widely believed that this is incomplete. However, due to the exponential growth of biomedical literature, timely manual curation is infeasible, rendering the need for efficient and accurate computational tools for automated curation. In this work, we present HPcurator, a novel two-step framework for extracting relations between proteins and HPO terms from biomedical literature. First, we implement ProPheno, a comprehensive online dataset composed of human protein-phenotype co-mentions extracted from the entire set of biomedical articles. Subsequently, we show that these co-mentions are useful as a complementary source of input for a different, but highly related, task of automated protein-phenotype prediction. Next, we develop a supervised machine learning model called PPPred, which, to the best of our knowledge, is the first predictive model that can classify the validity of a given sentence-level protein-phenotype co-mention. Using a gold standard dataset composed of manually curated sentence co-mentions, we demonstrate that PPPred significantly outperforms several baseline methods. Finally, we propose SSEnet, a novel deep semi-supervised ensemble framework for relation extraction that combines deep learning, semi-supervised learning, and ensemble learning. This framework is motivated by the fact that while the manual annotation of co-mentions is extremely prohibitive, we have access to millions of unlabeled co-mentions. We develop a prototype of HPcurator by instantiating SSEnet with ProPheno, self-learning, pre-trained language models, as well as convolutional and recurrent neural networks. This system can successfully output a ranked list of relevant sentences for a user input protein-phenotype pair. Our experimental results indicate that this system provides state-of-the-art performance in human protein- HPO term relation extraction. The findings and the insight gained from this work have implications for biocurators, biologists, and the computer science community involved in developing biomedical text mining tools.Item Developing bio-inspired methodologies for encoding angular position from strain(Montana State University - Bozeman, College of Engineering, 2020) Lange, Christopher William; Chairperson, Graduate Committee: Mark JankauskiAs mechanical systems rely more on closed-loop control, the sensors which supply feedback information are essential. Additionally, in systems where sensor function is critical, sensor redundancy is important to retain functionality if one or more sensors fail. Redundancy can be achieved through multiple high-fidelity sensors which measure the same type of information, such as gyroscopes or accelerometers. However, multiple high-fidelity sensors can increase cost significantly. This thesis explores the potential to replace or augment the functionality of angular position sensors using strain measurements. Strain gauges are already used in system health monitoring systems. By utilizing these already implemented sensors to measure angular position, we can remove the additional cost of redundant angular position sensors. However, for complex systems, the mapping between strain and angular position is unclear. By incorporating reduced order, physics-based models into machine learning techniques, we can efficiently transform high-order strain data into angular position. To demonstrate the potential of using alternative sensing methods, we developed a reduced order model of a parametrically excited flexible pendulum. Inspiration for this simplified system comes from insect halteres, which are small sensory organs evolved from insect hind wings which provide rapid information about body rotation. The parametrically excited flexible pendulum allows a single axis of rotation and single direction of flexibility to be paired, and their relationship studied. By varying parameters within the model such as pendulum length and modulus as well as parametric excitation amplitude and frequency, the Gaussian process regression learning can be optimized to reduce training time and increase untrained prediction accuracy. Inputs of strain and parametric excitation position along with their respective first and second derivatives are then analyzed to determine which inputs are interrelated and therefore un-necessary, thus reducing the input required. This provides the essential first steps towards using machine learning to implement multiple sensor, deformation based, multi axial angular position sensing in complex systems.Item Predicting anticancer peptides and protein function with deep learning(Montana State University - Bozeman, College of Engineering, 2020) Lane, Nathaniel Patrick; Chairperson, Graduate Committee: Indika KahandaAnticancer peptides (ACPs) are a promising alternative to traditional chemotherapy. To aid wet-lab and clinical research, there is a growing interest in using machine learning techniques to help identify good ACP candidates computationally. In this work, we develop DeepACPpred, a novel deep learning model for predicting ACPs using their amino acid sequences. Using several gold-standard ACP datasets, we demonstrate that DeepACPpred is highly effective compared to state-of-the-art ACP prediction models. Furthermore, we adapt the above neural network model for predicting protein function and report our experience with participating in a community-wide large-scale assessment of protein functional annotation tools.Item Predicting metamorphic relations: an evaluation of program representations and machine learning techniques(Montana State University - Bozeman, College of Engineering, 2020) Rahman, Karishma; Chairperson, Graduate Committee: Upulee Kanewala; Upulee Kanewala was a co-author of the article, 'Predicting metamorphic relations for matrix calculation programs' in the 'MET18: Proceedings of the 3rd International Workshop on Metamorphic Testing' which is contained within this thesis.Testing complex scientific applications can often be a complicated and expensive procedure. A test oracle is used to verify the behavior of the software under test. However, difficulties due to the implementation of a test oracle make the process of systematically testing scientific applications more challenging. This problem is known as the oracle problem. Metamorphic testing (MT) is an effective technique to test these applications as it uses metamorphic relations (MRs) to determine whether test cases have passed or failed. Metamorphic relations are essential components of metamorphic testing that highly affect its fault detection effectiveness. MRs are usually identified with the help of a domain expert, which is a labor-intensive task. In this work, a previously developed graph kernel-based machine learning method is extended by predicting MRs for functions that perform matrix calculations. Then, semi-supervised support vector machine (S3VM) is used to build the predictive model for the suggested approach. Finally, call graph (CG) information of the functions are used to calculate the graph kernels to predict MRs. The overall result shows that random walk kernel performs better than the graphlet kernel, and semi-supervised learning can be effective with more unlabelled data. Also, the use of call graph representation presents a new avenue of research in predicting MRs for unseen functions.