Theses and Dissertations at Montana State University (MSU)
Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/732
Browse
7 results
Search Results
Item An exploration of whole-genome comparative genomic strategies for polyploid crop genomes(Montana State University - Bozeman, The Graduate School, 2022) Reynolds, Gillian Lucy; Co-chairs, Graduate Committee: Brendan Mumey and Jennifer A. LachowiecGenome comparison for large and complex polyploid crop genomes is a highly complex venture, yet it is critical. Given a rising demand for food coupled with yield-impacting resource limitations and rapidly changing global climates it has never been more important to characterise the underlying genetic variation which underpins traits of agronomic interest. In this work, the problem of polyploidy genome comparison is explored at three levels. The first chapter characterizes the sequence relationships that exist between, and within, polyploidy genomes. This is achieved by hijacking a metagenomic strategy for rapid, and efficient, genome sequence classification. The second chapter then utilizes the identified subgenome- specific k-mer profiles for recruitment of assembled contigs and scaffolds previously only recruitable via more resource intensive optical mapping strategies. This makes a greater proportion of the assembled data usable for downstream variant analysis. The third chapter then zooms into the problem of how to identify variants from large -scale sequencing data while minimizing bias and computational costs. A critical assessment of modern variant calling for crop genomes is performed and an algorithm to further extend a new, resource efficient, approach for large scale comparative genomics is presented and critically evaluated. In all, the work presented herein takes a top-down journey from genome- and subgenome- level comparative genomics all the way to identifying base-pair resolution strategies that are capable of revealing the underlying sequences responsible for keeping the world fed.Item Computational investigation on protein sequencing and genome rearrangement problems(Montana State University - Bozeman, College of Engineering, 2018) Qingge, Letu; Chairperson, Graduate Committee: Binhai ZhuDe novo protein sequencing and genome rearrangement problems are the classical problems in bioinformatics. De novo protein sequencing problem try to determine the whole sequence of amino acids based on the mass spectrometry data without using the database search. Genome rearrangement problems try to recognize the evolutionary process between two species. In this dissertation, first, we describe the process of constructing target protein sequences by utilizing mass spectrometry based data from both top-down and bottom-up tandem mass spectra. In addition to using data from mass spectrometry analysis, we also utilize techniques for de novo protein sequencing using a homologous protein sequence as a reference to attempt to fill in any remaining gaps in the constructed protein scaffold. Initial results for analysis on real datasets yield over 96-100% coverage and 73-91% accuracy with the target protein sequence. Second, we use different genome rearrangement operations to transform one genome to another such that the similarity between two genomes is maximized. We explore these problems in terms of theoretical and experimental analysis. For sorting unsigned genome problem by double cut and join (DCJ) operation, we design a randomized fixed parameter tractable (FPT) approximation algorithm for computing the DCJ distance with an approximation factor 4/3 + Epsilon, and the running time O*(2 d*), where d* represents the optimal DCJ distance. For one-sided exemplar adjacency number problem, we reformulate the problem as maximum independent set in a colored interval graph and hence reduce the appearance of each gene at most twice. Moreover, we design a factor-2 approximation and also show that the approximation factor can not be improved less than 2 by some local search technique. At last, we apply integer linear programming to solve the reduced instance exactly. For the minimum copy number generation problem, we analyze the complexity of different variations of this problem and show a practical algorithm for the general case based on greedy method.Item Analysis of archaeal viruses and characterization of their communities in Yellowstone National Park(Montana State University - Bozeman, College of Letters & Science, 2014) Bolduc, Benjamin Ian; Chairperson, Graduate Committee: Mark J. Young; Daniel P. Shaughnessy, Yuri I. Wolf, Eugene Koonin, Francisco F. Roberto, and Mark J. Young were co-authors of the article, 'Identification of novel positive-strand RNA viruses by metagenome analysis of archaea-dominated Yellowstone hot springs' in the journal 'Journal of virology' which is contained within this thesis.; Jennifer Wirth, Aurélien Mazurie, and Mark J. Young were co-authors of the article, 'Viral community composition in Yellowstone acidic hot springs assessed by network analysis' submitted to the journal 'ISME Journal' which is contained within this thesis.; Mark J. Young was a co-author of the article, 'Characterization of viral communities in Yellowstone hot springs by deep-sequencing' which is contained within this thesis.Viruses infecting the Archaea - the third domain of life - are the least understood of all viruses. Despite only 100 archaeal viruses being described, work on these viruses revealed a remarkable level of morphological and genetic diversity unmatched by their bacterial and eukaryotic counterparts, whose numbers range over 6000. Study of these archaeal viruses could gain insight into fundamental aspects of biology and reveal underlying evolutionary connections spanning the three domains of life, including the origin of life. In addition, we understand very little about their community structures in natural environments. To address these daunting tasks, a viral metagenomics approach was undertaken using next generation sequencing technologies. Despite this, only a fragmented view of the viral communities is possible in natural ecosystems. Therefore, this dissertation sought to apply a network-based approach in combination with viral metagenomics to not only describe natural viral communities, but to find and characterize the first RNA viruses out of acidic, high-temperature hot springs in Yellowstone National Park, USA. These hot springs harbor low complexity cellular communities dominated by several species of hyperthermophilic Archaea. The results of this dissertation show that this approach can identify distinct viral populations and provide insights into the viral community. Furthermore, the viral communities of these hot springs are relatively stable over the course of the sampling time period. In addition, a number of viral clusters - each representing a viral family at the taxonomic level - are likely previously uncharacterized DNA viruses infecting archaeal hosts. This approach demonstrates the utility of combining viral community sequencing with a network analysis to understand viral community structures in natural ecosystems. Additional analysis of these viral metagenomes led to the identification of novel RNA viral genome segments. Since no RNA virus infecting Archaea is known to exist, this dissertation also sought to more fully characterize these sequences. Genes for RNA-dependent RNA polymerases, a hallmark of positive-strand RNA viruses were identified, suggesting the existence of novel positive-strand RNA viruses likely replicating in hyperthermophilic archaeal hosts and are highly divergent from RNA viruses infecting eukaryotes and are even more distant from known bacterial RNA viruses.Item Proteomic analyses of Sulfolobus solfataricus : an extremophilic archaeon(Montana State University - Bozeman, College of Letters & Science, 2003) Barry, Richard CorneliusItem A bioinformatic analysis of the mononegavirales transcription/replication complex through the development of the Dissic pipeline(Montana State University - Bozeman, College of Letters & Science, 2013) Cleveland, Sean Bruce; Chairperson, Graduate Committee: Marcella A. McClure; John Davies and Marcella A. McClure were co-authors of the article, 'A bioinformatics approach to the structure, function, and evolution of the nucleoprotein of the order mononegavirales' in the journal 'PLOS one' which is contained within this thesis.; Marcella A. McClure was a co-author of the article, 'Disorder, intra-residue contact and coevolution prediction of the large subunit polymerase and phosphoprotein for the order Mononegavirale using the DisICC pipeline' submitted to the journal 'PeerJ' which is contained within this thesis.The viral members of the Order Mononegavirales are responsible for numerous diseases with high mortality and few if any treatments. Unfortunately, knowledge of these viruses is limited. Attempts to study the structure of the replication/transcription complex of these viruses using physical methods like X-ray crystallography and NMR spectroscopy have been largely unsuccessful due to the large size of this complex, as well as the amount of disorder these proteins show when isolated. The goal of this Bioinformatic study is to investigate sequence conservation in relation to evolutionary function/structure of the nucleoprotein (N), large subunit polymerase protein (L) and phosphoprotein (P) of the Order Mononegavirales. In the combined analysis of 63 representative viruses from the four viral families (Paramyxoviridae, Rhabdoviridae, Filoviridae, and Bornaviridae) were predicted using a developed Disorder, Intra-residue contact and Compensatory mutation Correlator, (DisICC) pipeline. The N protein results indicate conservation for disorder in the C-terminus region of the N viral proteins important for interacting with P and L during transcription and replication. Portions of the N-terminus are responsible for N:N stability with interactions identified by the presence or lack of co-evolving intra-protein contact predictions. Correlations between location and conservation of predicted regions reveal strong divisions between families while highlighting conservation within individual families in L. Suggesting L Domains are conserved across the Order with strong intra-sequence pressures for conservation, while hinge regions lack these pressures. Conserved disorder is reported for: the amino-terminal of L for L-L complex formation across all families, Domain V for capping activity across Paramyxovirinae and Vesiculovirus, and Domain VI for cap methylation is conserved across Paramyxovirinae, Rubulaviruses, Avulaviruses, Ferlavirus and Morbilliviruses. The P sequences show a strong conservation of disorder within viral families that corresponds to their binding Domains with little intra-sequence pressure. Validation of these predictions by current experimental and structural information illustrates the benefits of the DisICC pipeline for characterizing protein disorder and intra-residue contact that can reveal likely residues as disruption targets in these viruses that are infectious to humans.Item Proteomic and systems biology analysis of the response of monocytes to infection by Coxiella burnetii and exposure to innate immune adjuvants(Montana State University - Bozeman, College of Letters & Science, 2010) Shipman, Matthew Richard; Chairperson, Graduate Committee: Edward Dratz.Coxiella burnetii is an obligate intracellular pathogen that infects human monocytes, specifically inhabiting the phagolysosome. C. burnetii is a potential bioterror agent and is classified by the National Institute for Allergies and Infectious Diseases (NIAID) as a category B pathogen. This bacterium is remarkably infectious, requiring as little as one bacterium to cause infection. We used phase II C. burnetii, an avirulent laboratory strain that acts as a model for wild type phase I strains. Our research was directed towards a deeper understanding of the monocyte proteome in response to a) infection by phase II C. burnetii, and b) exposure to immune adjuvants known to increase monocyte resistance to infection by C. burnetii. Monomac I cells were infected with phase II C. burnetii and aliquots were taken at 24, 48, and 96 hours postinfection. Experiments with immune adjuvants that increase monocyte killing of C. burnetii, involved Monomac I cells treated with Securinine, E. coli lipopolysaccharide (LPS), and monophosphoryl lipid A (MPL). Securinine is a GABA A receptor antagonist that is being developed at Montana State University for biodefense purposes, and triggers an innate immune response that differs from classic Toll-like receptor (TLR) stimulation of innate immunity represented by LPS and MPL. We employed multiplex 2D gel electrophoresis (m2DE) using ZDyes, a new generation of covalent fluorescent protein dyes being developed at Montana State University, coupled with MS/MS analysis and bioinformatics to determine the proteome changes in Monomac I cells in response to the conditions described above, and to develop a preliminary mechanistic model using a systems biology approach to account for the observed changes and propose multiple testable hypotheses to focus downstream research efforts. We also tested the effects on Monomac I cells infected with phase II C. burnetii +/- Securinine. We observed a high proportion of cell death in the + Securinine samples, using a dosage of Securinine higher than the optimal effective dosage. The information derived from this experiment will be useful in monitoring the tendency towards cell death in Securinine treated samples both from C. burnetii infected monocytes and other cell types (e.g. neurons) that contain GABA A receptors.Item Metaprogramming bioinformatics in the postgenomic era(Montana State University - Bozeman, College of Engineering, 2006) Ohler, Nathaniel Tobias; Chairperson, Graduate Committee: Brendan MumeyThe number of bioinformatics programs available is continuously growing, along with the knowledge required to run each individual program. As more programs become available, more complex combinations of these programs are being used by scientists. Workflow engines attempt to remove repetitive procedures from these combinations by saving and executing any group of the programs as one larger "meta-program". The output of one program is automatically directed to the input of another thereby creating a "flow' of "work". The first part of this project is the design and development of an easy to use workflow editor that maintains the usability of the original programs and allows for the relatively simple addition of new programs. The result is a workflow editor that currently allows access to over 170 bioinformatics programs each with a simple, common, and descriptive interface. In the second part of this project a specific workflow application for protein antibody imprinting is examined. Part of the workflow is a program that produces a series of potential alignments for an antibody and a protein, has already been developed. An additional program is developed which uses a greedy search algorithm to select the "best" set of alignments. This alignment selection problem is shown to be NP-Complete. These programs represent a real world example of a bioinformatics workflow.