New statistical methods for analyzing proteomics data from affinity isolation lc-ms/ms experiments
The field of proteomics is exploding with statistical problems waiting to be explored. To obtain information on protein complexes, interactions between protein pairs is initially examined. This exploration is performed using `bait-prey' pro- tein pull-down assays that use a protein affnity agent and an LC-MS/MS (liquid chromatography-tandem mass-spectrometry)-based protein identifcation method. An experiment generates a protein association matrix wherein each column represents a sample from one bait protein, each row represents one prey protein and each cell contains a presence/absence association indicator. The prey protein presence/absence pattern is assessed with a Likelihood Ratio Test (LRT) and simulated LRT p-values. Fisher's Exact Test and a conditional frequency distribution test using generating functions are also used to assess the prey protein observation pattern. Based on the p-value, each prey protein is assigned a category (Specific or Non-Specific) and appraised with respect to the goal and design of the experiment. The Bayes' Odds is calculated for each prey-bait pair in the `Specific' category to estimate the posterior probability that two proteins interact and compared to an approach used by Gilchrist et al. .The method is illustrated using an experiment investigating protein complexes of Shewanella oneidensis MR-1 at the Proteomics Facility of Pacific Northwest National Laboratory (PNNL). The example analysis shows the results to be biologically sensible and more realistic than methods previously used to infer protein - protein associations. While inferring protein-protein associations is of great importance in proteomic studies, the quality of the data is of equal or greater importance. Protein-protein interactions may be inferred incorrectly or not at all depending on the quality of the data. Prior to this thesis, statistical quality control measures have not been incorporated into these experiments. The implementation of traditional Individual/Moving Range (IMR) charts and cumulative sum (cusum) quality control methods for use with pull-down experiment data is studied. These methodologies are illustrated using a standard protein mixture from PNNL. The joint application of IMR and cusum charts promises to provide researchers with information on changes in the mean and variability of the data resulting from control samples run through the mass spectrometer process.