Theses and Dissertations at Montana State University (MSU)
Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/732
Browse
13 results
Search Results
Item Investigating newer statistics instructors' breakthroughs with and motivations for using active learning: a longitudinal case-study and a multi-phase approach towards nstrument development(Montana State University - Bozeman, College of Letters & Science, 2022) Meyer, Elijah Sterling; Co-Chairs, Graduate Committee: Stacey Hancock and Jennifer Green; This is a manuscript style paper that includes co-authored chapters.National recommendations call for a shift from using lecture-based approaches to using approaches that engage students in the learning process, primarily through active learning techniques. Despite these recommendations, the adoption of active learning techniques for newer statistics instructors remains limited. The goal of this research is to provide a more holistic understanding about statistics instruction, specifically as it relates to recommended active learning techniques and newer statistics instructors, including graduate student instructors (GSIs). In this research, I present two studies. In the first study, we investigated GSIs' breakthroughs in their knowledge about, emotions towards, and use of active learning over time by using a longitudinal collective case-study approach. Survey, interview, and observation data across four semesters revealed that the GSIs' breakthroughs in their use of active learning only occurred after their increased knowledge about active learning aligned with their emotions towards it. This study further revealed that the GSIs needed to feel confident in and be challenged by their course structure before implementing active learning techniques. The second study builds upon these findings by exploring statistics instructors' motivations or reasons for using active learning. Under the self-determination theory framework, we conducted a multi-phase study to develop an instrument that measures four different types of motivational constructs for using group work, a specific active learning approach. We constructed items using expert opinion and cognitive interviews, and then we conducted two pilot studies with newer statistics instructors. The resulting reliability and validity evidence suggest that this instrument may help support future studies' investigations of motivation, helping us to better understand newer statistics instructors' use of active learning. Together, these studies may help inform future recommendations on how to support newer statistics instructors' early adoption of such technique.Item Bayesian hierarchical latent variable models for ecological data types(Montana State University - Bozeman, College of Letters & Science, 2022) Stratton, Christian Alexander; Chairperson, Graduate Committee: Jennifer Green and Andrew Hoegh (co-chair); This is a manuscript style paper that includes co-authored chapters.Ecologists and environmental scientists employ increasingly complicated sampling designs to address research questions that can help explain the impacts of climate change, disease, and other emerging threats. To understand these impacts, statistical methodology must be developed to address the nuance of the sampling design and provide inferences about the quantities of interest; this methodology must also be accessible and easily implemented by scientists. Recently, hierarchical latent variable modeling has emerged as a comprehensive framework for modeling a variety of ecological data types. In this dissertation, we discuss hierarchical modeling of multi-scale occupancy data and multi-species abundance data. Within the multi-scale occupancy framework, we propose new methodology to improve computational performance of existing modeling approaches, resulting in a 98% decrease in computation time. This methodology is implemented in an R package developed to encourage community uptake of our method. Additionally, we propose a new modeling framework capable of simultaneous clustering and ordination of ecological abundance data that allows for estimation of the number of clusters present in the latent ordination space. This modeling framework is also extended to accommodate hierarchical sampling designs. The proposed modeling framework is applied to two data sets and code to fit our model is provided. The software and statistical methodology proposed in this dissertation illustrate the flexibility of hierarchical latent variable modeling to accommodate a variety of data types.Item Supporting data-intensive environmental science research: data science skills for scientific practitioners of statistics(Montana State University - Bozeman, College of Letters & Science, 2020) Theobold, Allison Shay; Chairperson, Graduate Committee: Stacey Hancock; Stacey Hancock was a co-author of the article, 'How environmental science graduate students acquire statistical computing skills' in the journal 'Statistics education research journal' which is contained within this dissertation.; Stacey Hancock and Sara Mannheimer were co-authors of the article, 'Designing data science workshops for data-intensive environmental science research' submitted to the journal 'Journal of statistics education ' which is contained within this dissertation.; Stacey Hancock was a co-author of the article, 'Data science skills in data-intensive environmental science research: the case of Alicia and Ellie' submitted to the journal 'Harvard data science review' which is contained within this dissertation.The importance of data science skills for modern environmental science research cannot be understated, but graduate students in these fields typically lack these integral skills. Yet, over the last 20 years statistics preparation in these fields has grown to be considered vital, and statistics coursework has been readily incorporated into graduate programs. As 'data science' is the study of extracting value from data, the field shares a great deal of conceptual overlap with the field of Statistics. Thus, many environmental science degree programs expect students to acquire these data science skills in an applied statistics course. A gap exists, however, between the data science skills required for students' participation in the entire data analysis cycle as applied to independent research, and those taught in statistics service courses. Over the last ten years, environmental science and statistics educators have outlined the shape of the data science skills specific to research in their respective disciplines. Disappointingly, however, both sides of these conversations have ignored the area at the intersection of these fields, specifically the data science skills necessary for environmental science practitioners of statistics. This research focuses on describing the nature of environmental science graduate students' need for data science skills when engaging in the data analysis cycle, through the voice of the students. In this work, we present three qualitative studies, each investigating a different aspect of this need. First, we present a study describing environmental science students' experiences acquiring the computing skills necessary to implement statistics in their research. In-depth interviews revealed three themes in these students' paths toward computational knowledge acquisition: use of peer support, seeking out a 'singular consultant,' and learning through independent research. Motivated by the need for extracurricular opportunities for acquiring data science skills, next we describe research investigating the design and implementation of a suite of data science workshops for environmental science graduate students. These workshops fill a critical hole in the environmental science and statistics curricula, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data. Finally, we conclude with research that works toward identifying key data science skills necessary for environmental science graduate students as they engage in the data analysis cycle.Item Space-filling designs for mixture/process variable experiments(Montana State University - Bozeman, College of Letters & Science, 2021) Obiri, Moses Yeboah; Chairperson, Graduate Committee: John J. BorkowskiThe ultimate objective of this dissertation was to present a statistical methodology and an algorithm for generating uniform designs for the combined mixture/process variable experiment. There are many methods available for constructing uniform designs and four of such methods have been used in this study. These are the Good Lattice Point (GLP) method, the cyclotomic field (CF) method, the square root sequence (SRS) method, and the power-of-a-prime (PP) method. A new hybrid algorithm is presented for generating uniform designs for mixture/process variable experiments. The algorithm uses the G function introduced by Fang and Yang (2000), and adopted by Borkowski and Piepel (2009) to map q-1 points from q + k - 1 points generated in the hypercube to the simplex. Two new criteria based on the Euclidean Minimum Spanning Tree (EMST) which are more computationally efficient for assessing uniformity of mixture designs and mixture/process variable designs are presented. The two criteria were found to be interchangeable and the geometric mean of the edge lengths (GMST) criterion is preferred to the average and standard deviation of edge lengths (adMST, sdMST) criterion. The GMST criterion uses only one statistic to quantify the uniformity properties of mixture and mixture/process variable designs. Tables of good uniform designs are provided for mixture experiments in the full simplex (S q) for q = 3; 4; 5 and practical design sizes, 9 _ n _ 30, using the four number theoretic methods in this study. A conditional approach based on the GMST criterion for generating good uniform mixture/process variable designs is also introduced and tables of good uniform designs are given for the combined q-mixture and k process variable experiments for q = 3; 4; 5, k = 1; 2 and practical numbers of runs, 9 _ n _ 30. A new algorithm is provided to augment existing mixture design points with space-filling points including designs with existing clustered design points. In this algorithm, new design points are chosen from a candidate set of points such that the resulting augmented design has good space-filling properties. The SRS method is found to produce the best augmented space-filling mixture designs.Item Monothetic cluster analysis with extensions to circular and functional data(Montana State University - Bozeman, College of Letters & Science, 2019) Tran, Tan Vinh; Chairperson, Graduate Committee: Mark Greenwood; Mark C. Greenwood was a co-author of the article, 'Choosing the number of clusters in monothetic cluster analysis' submitted to the journal 'Electronic journal of applied statistical analysis' which is contained within this dissertation.; Mark C. Greenwood, John C. Priscu and Marie Sabacka were co-authors of the article, 'Visualization and monothetic clustering data with circular variables' submitted to the journal 'Journal of environmental statistics' which is contained within this dissertation.; Mark C. Greenwood was a co-author of the article, 'Clustering on functional data' submitted to the journal 'PeerJ - the journal of life and environmental sciences ' which is contained within this dissertation.; Mark C. Greenwood was a co-author of the article, 'Monothetic clustering and partitioning using local subregions: the R packages monoClust and PULS' submitted to the journal 'The journal of open source software' which is contained within this dissertation.Monothetic clustering is a divisive clustering method that uses a hierarchical, recursive partitioning of multivariate responses based on binary decision rules that are built from individual response variables. This clustering technique is helpful for applications where the rules of groupings of observations as well as predicting new subjects into clusters are both important. Based on the ideas of classification and regression trees, a monothetic clustering algorithm was implemented in R to allow further explorations and modifications. One of the common problems in performing clustering is deciding whether a cluster structure is present and, if it is, how many clusters are 'enough'. Some well-established techniques are reviewed as well as new methods based on cross-validation and permutation-based hypothesis tests at each split are suggested. Monothetic clustering is of interest to be applied in a variety of situations. This can include data sets with circular variables, where the variables' natures are not linear. A method for monothetic clustering and visualizations of clusters with circular variables was developed that could also be used in other classification and regression tree situations. Clustering is also interesting for data sets where the responses can be transformed into functional data, which has unique properties that need exploring. Partitioning Using Local Subregions (PULS), a clustering technique inspired by monothetic clustering to overcome some of its disadvantages in clustering functional data, is discussed. In this algorithm, clusters are formed based on aggregating the information from several variables or time intervals. In both monothetic clustering and PULS, it is possible to limit the set of feasible splitting variables to be able to create clusters for new observations without observing all variables or times to assign new observations to the clusters. R packages for these methods have been developed for others to use and test and support the proposed research, and a detailed vignette is provided for utilizing all the functions developed here.Item Visual sample plan and prior information: what do we need to know to find UXO?(Montana State University - Bozeman, College of Letters & Science, 2016) Flagg, Kenneth A.; Writing Project Advisor: Megan HiggsMilitary training and weapons testing activities leave behind munitions debris, including both inert fragments and explosives that failed to detonate. The latter are known as unexploded ordnance (UXO). It is important to find and dispose of UXO items that are located where people could come into contact with them and cause them to detonate. Typically there exists uncertainty about the locations of UXO items and the sizes of UXO- containing regions at a site, so statistical analyses are used to support decisions made while planning a site remediation project. The Visual Sample Plan software (VSP), published by the Pacific Northwest National Laboratory, is widely used by United States military contractors to guide sampling plan design and to identify regions that are likely to contain UXO. VSP has many features used for a variety of situations in UXO cleanup and other types of projects. This study focuses on the sampling plan and geostatistical mapping features used to find target areas where UXO may be present. The software produces transect sampling plans based on prior information entered by the user. After the sample data are collected, VSP estimates spatial point density using circular search windows and then uses Kriging to produce a continuous map of point density across the site. I reviewed the software's documentation and examined its output files to provide insight about how VSP does its computations, allowing the software's analyses to be closely reproduced and therefore better understood by users. I perform a simulation study to investigate the performance of VSP for identifying target areas at terrestrial munitions testing sites. I simulate three hypothetical sites, differing in the size and number of munitions use areas, and in the complexity of the background noise. Many realizations of each site are analyzed using methods similar to those employed by VSP to delineate regions of concentrated munitions use. I use the simulations to conduct two experiments, the first of which explores the sensitivity of the results to different search window sizes. I analyze two hundred realizations of the simplest site using the same sampling plan and five different window sizes. Based on the results, I select 90% of the minor axis of the target area of interest as the window diameter for the second experiment. The second experiment studies the effects of the prior information about the target area size and spatial point density of munitions items. For each site, I use four prior estimates of target area size and three estimates of point density to produce twelve sampling plans. One hundred realizations of each site are analyzed with each of the twelve sampling plans. I evaluate the analysis in terms of the detection rates of munitions items and target areas, the distances between undetected munitions items and identified areas, the total area identified, and other practical measures of the accuracy and efficiency of the cleanup effort. I conclude that the most accurate identification of target areas occurs when the sampling plan is based on the true size of the smallest target area present. The prior knowledge of the spatial point density has relatively little impact on the outcome.Item Investigating the teaching of statistics with technology at the high school level through the use of annotated lesson plans(Montana State University - Bozeman, College of Letters & Science, 2016) Arnold, Elizabeth Grace; Chairperson, Graduate Committee: Elizabeth BurroughsThroughout the last twenty years, data analysis and statistics content, together with the integration of technology in mathematics classrooms, have gained increasing attention in the United States at the K-12 level. National and state standards now emphasize statistics concepts throughout high school and there is a growing motivation to shift from a traditional formula-based style of teaching statistics to a more data-oriented approach emphasizing conceptual understanding and statistical literacy. To implement this approach in the classroom, it is necessary to integrate technology into the teaching of statistics. However, many in-service high school mathematics teachers are not familiar with this process, and statistics is still a relatively new subject for most. This discrepancy highlights the need to help foster and develop in-service high school mathematics teachers' ability to effectively use technology when teaching statistics. The goal of this study was to investigate how specially annotated lesson plans influence and guide in-service high school mathematics teachers' use of technology when teaching statistical concepts. I developed a completely randomized block experiment, using quantitative and qualitative measurements and methods of analysis. High school mathematics teachers were randomly assigned to receive an annotated or non-annotated statistics unit that included technology-based activities; four lessons were observed. The results of this study demonstrated how the process of helping teachers effectively use technology in the instruction of statistics is not straightforward; there was a large amount of variation in how teachers integrated technology and no consistent differences between the annotated and non-annotated group in this regard. All teachers, regardless of received unit, integrated technology more effectively when they were provided with a technology-based activity employing simulation. Teachers' integration of technology was most influenced by their awareness of the use of inquiry.Item Statistics in the presence of cost : cost-considerate variable selection and MCMC convergence diagnostics(Montana State University - Bozeman, College of Letters & Science, 2016) Lerch, Michael David; Chairperson, Graduate Committee: Steve CherryThe overarching objective of this research is to address and recognize the cost-benefit trade-off inherent in much of statistics. We identify two places where such a balance is present for researchers: variable selection and Markov chain Monte Carlo (MCMC) sampling. An easily identifiable source of cost in science occurs when taking measurements. Researchers measure variables to estimate another quantity based on a model. When model building, researchers may have access to a large number of variables to include in the model and may consider using a subset of the variables so that future uses of the model need only measure this subset rather than all variables. The researchers are incentivized to proceed in this manner if some variables are prohibitively expensive to measure for future uses of the model. In this research, we present a new algorithm for cost-considerate variable selection in linear modeling when confronted with this problem. Since overfitting may be a danger when many variables at the disposal of the researcher, we build on the LARS and Lasso algorithms to perform cost-based variable selection in concert with model regularization. In MCMC sampling for Bayesian statistics, the cost-benefit trade-off is unavoidable. Researchers sampling from a posterior distribution must run a sampler for some number of iterations before finally stopping the sampler to make inference on the finite number of samples drawn. In this situation, the cost to be reduced is time to run the sampler while realizing the longer the sampler is run, the better the convergence. Time may not be as tangible a cost as a dollar figure, but increased wait time to perform analyses incurs the cost of running a computer and any negative effects associated with a delay as the researcher waits until the sampler has finished running. In this research, we introduce new convergence assessment tools in a diagnostic and plot. Unlike commonly used convergence diagnostics, these new tools focus explicitly on posterior quantiles and probabilities which are common inferential objectives in Bayesian statistics. Additionally, we introduce equivalence testing to the convergence assessment domain by using it as the framework of the diagnostic.Item Statistical methods in microbial disinfection assays(Montana State University - Bozeman, College of Letters & Science, 1997) DeVries, Todd AlanItem Nonparametric estimation of semivariogram functions(Montana State University - Bozeman, College of Letters & Science, 1994) Cherry, John Steven