Machine learning for pangenomics

dc.contributor.advisorChairperson, Graduate Committee: Brendan Mumeyen
dc.contributor.authorManuweera, Buwani Sakyaen
dc.contributor.otherThis is a manuscript style paper that includes co-authored chapters.en
dc.date.accessioned2022-11-09T22:47:42Z
dc.date.available2022-11-09T22:47:42Z
dc.date.issued2021en
dc.description.abstractFinding genotype-phenotype associations is an important task in biology. Most of the the existing reference-based methods introduce biases because they use a single genome from an individual as the reference sequence. So, these biases can lead to limitations in inferred genotype-phenotype associations. Advances in sequencing techniques have enabled access to a large number of sequenced genomes from multiple organisms from different species. These can be used to create a pangenome, which represents a collection of genetic information from multiple organisms. Using a pangenome can effectively reduce those limitation issues as it does not require a reference. Recently, machine learning techniques are emerging as effective methods for problems involving genomics and pangenomics data. Kernel methods are used as a part of machine learning models to compute similarities between instances. Kernels can map the given set of data into a different feature space that can help distinguish the data into corresponding classes. In this work, we develop supervised machine learning models using a set of features gathered using pangenomic graphs, and the effectiveness of those features is evaluated in predicting yeast phenotypes. We first evaluated the effectiveness of the features using a a traditional supervised machine learning model and, then compared it to novel custom kernels that incorporate the information from the pangenomic graphical structure. Experimental results using yeast phenotypes indicate that the developed machine learning models that use reference-free features and novel kernels outperform models based on traditional reference-based features. This work has implications for bioinformaticians and computational biologists working with pangenomes as well as computer scientists developing predictive models for genomic data.en
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/16638en
dc.language.isoenen
dc.publisherMontana State University - Bozeman, College of Engineeringen
dc.rights.holderCopyright 2021 by Buwani Sakya Manuweeraen
dc.subject.lcshBiologyen
dc.subject.lcshGenomicsen
dc.subject.lcshPhenotypeen
dc.subject.lcshMachine learningen
dc.subject.lcshAlgorithmsen
dc.titleMachine learning for pangenomicsen
dc.typeThesisen
mus.data.thumbpage68en
thesis.degree.committeemembersMembers, Graduate Committee: Jennifer A. Lachowiecen
thesis.degree.departmentComputing.en
thesis.degree.genreThesisen
thesis.degree.nameMSen
thesis.format.extentfirstpage1en
thesis.format.extentlastpage79en

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
manuweera-machine-2021.pdf
Size:
10.77 MB
Format:
Adobe Portable Document Format
Description:
Machine learning for pangenomics (PDF)

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
826 B
Format:
Plain Text
Description:
Copyright (c) 2002-2022, LYRASIS. All rights reserved.