Machine learning for pangenomics

Manuweera, Buwani Sakya

Machine learning for pangenomics

dc.contributor.advisor	Chairperson, Graduate Committee: Brendan Mumey	en
dc.contributor.author	Manuweera, Buwani Sakya	en
dc.contributor.other	This is a manuscript style paper that includes co-authored chapters.	en
dc.date.accessioned	2022-11-09T22:47:42Z
dc.date.available	2022-11-09T22:47:42Z
dc.date.issued	2021	en
dc.description.abstract	Finding genotype-phenotype associations is an important task in biology. Most of the the existing reference-based methods introduce biases because they use a single genome from an individual as the reference sequence. So, these biases can lead to limitations in inferred genotype-phenotype associations. Advances in sequencing techniques have enabled access to a large number of sequenced genomes from multiple organisms from different species. These can be used to create a pangenome, which represents a collection of genetic information from multiple organisms. Using a pangenome can effectively reduce those limitation issues as it does not require a reference. Recently, machine learning techniques are emerging as effective methods for problems involving genomics and pangenomics data. Kernel methods are used as a part of machine learning models to compute similarities between instances. Kernels can map the given set of data into a different feature space that can help distinguish the data into corresponding classes. In this work, we develop supervised machine learning models using a set of features gathered using pangenomic graphs, and the effectiveness of those features is evaluated in predicting yeast phenotypes. We first evaluated the effectiveness of the features using a a traditional supervised machine learning model and, then compared it to novel custom kernels that incorporate the information from the pangenomic graphical structure. Experimental results using yeast phenotypes indicate that the developed machine learning models that use reference-free features and novel kernels outperform models based on traditional reference-based features. This work has implications for bioinformaticians and computational biologists working with pangenomes as well as computer scientists developing predictive models for genomic data.	en
dc.identifier.uri	https://scholarworks.montana.edu/handle/1/16638	en
dc.language.iso	en	en
dc.publisher	Montana State University - Bozeman, College of Engineering	en
dc.rights.holder	Copyright 2021 by Buwani Sakya Manuweera	en
dc.subject.lcsh	Biology	en
dc.subject.lcsh	Genomics	en
dc.subject.lcsh	Phenotype	en
dc.subject.lcsh	Machine learning	en
dc.subject.lcsh	Algorithms	en
dc.title	Machine learning for pangenomics	en
dc.type	Thesis	en
mus.data.thumbpage	68	en
thesis.degree.committeemembers	Members, Graduate Committee: Jennifer A. Lachowiec	en
thesis.degree.department	Computing.	en
thesis.degree.genre	Thesis	en
thesis.degree.name	MS	en
thesis.format.extentfirstpage	1	en
thesis.format.extentlastpage	79	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: manuweera-machine-2021.pdf
Size:: 10.77 MB
Format:: Adobe Portable Document Format
Description:: Machine learning for pangenomics (PDF)

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 826 B
Format:: Plain Text
Description:

Download

Collections

Theses and Dissertations at Montana State University (MSU)