Theses and Dissertations at Montana State University (MSU)
Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/732
Browse
3 results
Search Results
Item Large-scale automated human protein-phenotype relation extraction from biomedical literature(Montana State University - Bozeman, College of Engineering, 2020) Pourreza Shahri, Morteza; Chairperson, Graduate Committee: Indika KahandaIdentifying protein-phenotype relations is of paramount importance for applications such as uncovering rare and complex diseases. Human Phenotype Ontology (HPO) is a recently introduced standardized vocabulary for describing disease-related phenotypic abnormalities in humans. While the official HPO knowledge base maintains known associations between human proteins and HPO terms, it is widely believed that this is incomplete. However, due to the exponential growth of biomedical literature, timely manual curation is infeasible, rendering the need for efficient and accurate computational tools for automated curation. In this work, we present HPcurator, a novel two-step framework for extracting relations between proteins and HPO terms from biomedical literature. First, we implement ProPheno, a comprehensive online dataset composed of human protein-phenotype co-mentions extracted from the entire set of biomedical articles. Subsequently, we show that these co-mentions are useful as a complementary source of input for a different, but highly related, task of automated protein-phenotype prediction. Next, we develop a supervised machine learning model called PPPred, which, to the best of our knowledge, is the first predictive model that can classify the validity of a given sentence-level protein-phenotype co-mention. Using a gold standard dataset composed of manually curated sentence co-mentions, we demonstrate that PPPred significantly outperforms several baseline methods. Finally, we propose SSEnet, a novel deep semi-supervised ensemble framework for relation extraction that combines deep learning, semi-supervised learning, and ensemble learning. This framework is motivated by the fact that while the manual annotation of co-mentions is extremely prohibitive, we have access to millions of unlabeled co-mentions. We develop a prototype of HPcurator by instantiating SSEnet with ProPheno, self-learning, pre-trained language models, as well as convolutional and recurrent neural networks. This system can successfully output a ranked list of relevant sentences for a user input protein-phenotype pair. Our experimental results indicate that this system provides state-of-the-art performance in human protein- HPO term relation extraction. The findings and the insight gained from this work have implications for biocurators, biologists, and the computer science community involved in developing biomedical text mining tools.Item Predicting anticancer peptides and protein function with deep learning(Montana State University - Bozeman, College of Engineering, 2020) Lane, Nathaniel Patrick; Chairperson, Graduate Committee: Indika KahandaAnticancer peptides (ACPs) are a promising alternative to traditional chemotherapy. To aid wet-lab and clinical research, there is a growing interest in using machine learning techniques to help identify good ACP candidates computationally. In this work, we develop DeepACPpred, a novel deep learning model for predicting ACPs using their amino acid sequences. Using several gold-standard ACP datasets, we demonstrate that DeepACPpred is highly effective compared to state-of-the-art ACP prediction models. Furthermore, we adapt the above neural network model for predicting protein function and report our experience with participating in a community-wide large-scale assessment of protein functional annotation tools.Item Predicting metamorphic relations: an evaluation of program representations and machine learning techniques(Montana State University - Bozeman, College of Engineering, 2020) Rahman, Karishma; Chairperson, Graduate Committee: Upulee Kanewala; Upulee Kanewala was a co-author of the article, 'Predicting metamorphic relations for matrix calculation programs' in the 'MET18: Proceedings of the 3rd International Workshop on Metamorphic Testing' which is contained within this thesis.Testing complex scientific applications can often be a complicated and expensive procedure. A test oracle is used to verify the behavior of the software under test. However, difficulties due to the implementation of a test oracle make the process of systematically testing scientific applications more challenging. This problem is known as the oracle problem. Metamorphic testing (MT) is an effective technique to test these applications as it uses metamorphic relations (MRs) to determine whether test cases have passed or failed. Metamorphic relations are essential components of metamorphic testing that highly affect its fault detection effectiveness. MRs are usually identified with the help of a domain expert, which is a labor-intensive task. In this work, a previously developed graph kernel-based machine learning method is extended by predicting MRs for functions that perform matrix calculations. Then, semi-supervised support vector machine (S3VM) is used to build the predictive model for the suggested approach. Finally, call graph (CG) information of the functions are used to calculate the graph kernels to predict MRs. The overall result shows that random walk kernel performs better than the graphlet kernel, and semi-supervised learning can be effective with more unlabelled data. Also, the use of call graph representation presents a new avenue of research in predicting MRs for unseen functions.