Statistics in the presence of cost : cost-considerate variable selection and MCMC convergence diagnostics

dc.contributor.advisorChairperson, Graduate Committee: Steve Cherryen
dc.contributor.authorLerch, Michael Daviden
dc.date.accessioned2017-05-02T19:49:50Z
dc.date.available2017-05-02T19:49:50Z
dc.date.issued2016en
dc.description.abstractThe overarching objective of this research is to address and recognize the cost-benefit trade-off inherent in much of statistics. We identify two places where such a balance is present for researchers: variable selection and Markov chain Monte Carlo (MCMC) sampling. An easily identifiable source of cost in science occurs when taking measurements. Researchers measure variables to estimate another quantity based on a model. When model building, researchers may have access to a large number of variables to include in the model and may consider using a subset of the variables so that future uses of the model need only measure this subset rather than all variables. The researchers are incentivized to proceed in this manner if some variables are prohibitively expensive to measure for future uses of the model. In this research, we present a new algorithm for cost-considerate variable selection in linear modeling when confronted with this problem. Since overfitting may be a danger when many variables at the disposal of the researcher, we build on the LARS and Lasso algorithms to perform cost-based variable selection in concert with model regularization. In MCMC sampling for Bayesian statistics, the cost-benefit trade-off is unavoidable. Researchers sampling from a posterior distribution must run a sampler for some number of iterations before finally stopping the sampler to make inference on the finite number of samples drawn. In this situation, the cost to be reduced is time to run the sampler while realizing the longer the sampler is run, the better the convergence. Time may not be as tangible a cost as a dollar figure, but increased wait time to perform analyses incurs the cost of running a computer and any negative effects associated with a delay as the researcher waits until the sampler has finished running. In this research, we introduce new convergence assessment tools in a diagnostic and plot. Unlike commonly used convergence diagnostics, these new tools focus explicitly on posterior quantiles and probabilities which are common inferential objectives in Bayesian statistics. Additionally, we introduce equivalence testing to the convergence assessment domain by using it as the framework of the diagnostic.en
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/12371en
dc.language.isoenen
dc.publisherMontana State University - Bozeman, College of Letters & Scienceen
dc.rights.holderCopyright 2016 by Michael David Lerchen
dc.subject.lcshStatisticsen
dc.subject.lcshCosten
dc.subject.lcshMonte Carlo methoden
dc.subject.lcshMathematical modelsen
dc.titleStatistics in the presence of cost : cost-considerate variable selection and MCMC convergence diagnosticsen
dc.typeDissertationen
mus.data.thumbpage76en
thesis.degree.committeemembersMembers, Graduate Committee: Megan Higgs; James Robison-Cox; John J. Borkowski; Mark Greenwood.en
thesis.degree.departmentMathematical Sciences.en
thesis.degree.genreDissertationen
thesis.degree.namePhDen
thesis.format.extentfirstpage1en
thesis.format.extentlastpage143en

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
LerchM1216.pdf
Size:
689.89 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
826 B
Format:
Plain Text
Description:
Copyright (c) 2002-2022, LYRASIS. All rights reserved.