Show simple item record

dc.contributor.advisorChairperson, Graduate Committee: Indika Kahandaen
dc.contributor.authorAnani, Mohammaden
dc.description.abstractResearch on mental disorders has been largely based on manuals such as the ICD-10 (International Classification of Diseases) and DSM-V (the Diagnostic Statistical Manual of Mental Disorders), which rely on the signs and symptoms of disorders for classification. However, this approach tends to overlook the underlying mechanisms of brain disorders and does not express the heterogeneity of those conditions. Thus, the National Institute of Mental Health (NIMH) introduced a new framework for mental illness research, namely, Research Domain Criteria (RDoC). RDoC is a research framework which utilizes various units of analysis from genetics, neural circuits, etc., for accurate multi-dimensional classification of mental illnesses. The RDoC framework is manually updated with units of analysis in periodic workshops. The process of updating the RDoC framework is accomplished by researching relevant evidence in the literature by domain experts. Due to the large amount of relevant biomedical research available, developing a method to automate the process of extracting evidence from the biomedical literature to assist with the curation of the RDoC matrix is key. In this thesis, we formulate three tasks that would be necessary for an automated biocuration pipeline for RDoC: 1) Labeling biomedical articles with RDoC constructs, 2) Retrieval of brain research articles, and 3) Extraction of relevant data from these articles. We model the first problem as a multilabel classification problem with 26 constructs of RDoC and use a gold-standard dataset of annotated PubMed abstracts and employ various supervised classification algorithms. The second task classifies general PubMed abstracts relevant to brain research using the same data from the first task and other unlabeled abstracts for training a model. Finally, for the third task, we attempt to extract Problem, Intervention, Comparison, and Outcomes (PICO) elements and brain region mentions from a subset of the RDoC abstracts. To the best of our knowledge, this is the first study aimed at automated data extraction and retrieval of RDoC related literature. The results of automating the aforementioned tasks are promising; we have a very accurate multilabel classification model, a good retrieval model, and an accurate brain region extraction model.en
dc.publisherMontana State University - Bozeman, College of Engineeringen
dc.subject.lcshMental illnessen
dc.subject.lcshInformation retrievalen
dc.subject.lcshMachine learningen
dc.titleExploring the feasibility of an automated biocuration pipeline for research domain criteriaen
dc.rights.holderCopyright 2019 by Mohammad Ananien, Graduate Committee: Upulee Kanewala; Matt Kuntz; Brendan Mumey.en School of Computing.en

Files in this item


This item appears in the following Collection(s)

Show simple item record

MSU uses DSpace software, copyright © 2002-2017  Duraspace. For library collections that are not accessible, we are committed to providing reasonable accommodations and timely access to users with disabilities. For assistance, please submit an accessibility request for library material.