Dimension reduction in PCA : likelihood-based methods by Kamolchanok Choochaow A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Statistics Montana State University © Copyright by Kamolchanok Choochaow (2002) Abstract: The main objective of this thesis is to develop procedures for making inferences about the eigenvalues and eigenvectors of a covariance matrix. Specifically, new procedures for examining dimension reduction in principal component analysis (PCA) are developed. The dimension reduction consists of the following two aspects: reduction in the number of components and reduction in the number of original variables. The procedures are based on a likelihood approach. Parameterizations of eigenvalues and eigenvectors are presented. The parameterizations allow arbitrary eigenvalue multiplicities. The use of the Fisher scoring algorithm for computing maximum likelihood estimates of the covariance parameters subject to multiplicity and other constraints is discussed. Asymptotic distributions of estimators of covariance parameters are derived under normality and non-normality. Likelihood ratio tests and Bartlett corrections are described. Simulation studies show the effectiveness of Bartlett corrections. The new procedures are demonstrated to give better overall results than some existing methods.  DIMENSION REDUCTION IN PGA: LIKELIHOOD-BASED METHODS by Kamolchanok Choochaow A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Statistics MONTANA STATE UNIVERSITY Bozeman, Montana July 2002 ii 5 ^ ? < L ^ APPROVAL of a dissertation submitted by Kamolchanok Choochaow This dissertation has been read by each member of the dissertation committee and has been found to be satisfactory regarding content, English usage, format, citations, bibliographic style, and consistency, and is ready for submission to the College of Graduate Studies. Robert J. Boik (Signature)/ Date Approved for the Department of Mathematical Sciences Kenneth L. Bowers }\Zyn/Vv£>Lii % f b ) (Signature) l h u S / b ' 1 Date Approved for the College of Graduate Studies Bruce McLeod. / - . 2 (Signature) Date iii STATEMENT OF PERMISSION TO USE In presenting this dissertation in partial fulfillment of the requirements for a doctoral degree at Montana State University, I agree that the Library shall make it available to borrowers under rules of the Library. I further agree that copying of this dissertation is allowable only for scholarly purposes, consistent with “fair use” as prescribed in the U. S. Copyright Law. Requests for extensive copying or reproduction of this dissertation should be referred to Bell & Howell Information and Learning, 300 North Zeeb Road, Ann Arbor, Michigan 48106, to whom I have granted “the exclusive right to reproduce and distribute my dissertation in and from microform along with the non-exclusive right to reproduce and distribute my abstract in any format in whole or in part.” Signature Date <12 ACKNOWLEDGEMENTS I would like to thank my adviser, Professor Robert J. Boik, for his invaluable assistance and guidance during the preparation of this thesis. I truely appreciate his encouragement and patience. I am also grateful for John Borkowski, Jim Robinson-Cox, Steve Cherry, and Bill Quimby’s advice throughout my graduate studies. Lastly, I would like to thank my dad, mom, and husband for their love and support. VTABLE OF CONTENTS LIST OF TABLES .............................................. vii LIST OF FIGURES.................................................................................................. x 1. INTRODUCTION ............................................................................................. I Principal Component Analysis......................................................................... 2 Reduction in the Number of Components....................................................... 4 Reduction in the Number of Original Variables.............................................. 12 Example I .................................... 30 Example 2 ........................................................................................................... 32 2. PARAMETERIZATION....................... 35 Spectral M odel................................................................................................... 35 Parameterization of Eigenvalues ...................................................................... 36 Reduction in the Number of Variables..................................................... 37 Reduction in the Number of Components................................................. 41 Parameterization of Eigenvectors ..................................................................... 52 Solving for Implicit Parameters........................................................................ 70 Solving for r ................................................................................................ 70 Solving for r/ ................................................................................................ 71 3. ESTIMATING PARAMETERS AND CONSTRUCTING ASYMP­ TOTIC DISTRIBUTIONS ........................................................ 72 Loglikelihood Function....................................................................................... 73 Fisher Scoring Algorithm................................................................................... 75 Solving the Likelihood Equation...................................................................... 76 Finding the Initial Guesses.............................................................................. 77 Asymptotic Distributions of Estimators.....................................'.................... 81 Normal Population....................................................................................... 81 Non-normal Population............................................................................... 85 4. TESTING HYPOTHESIS AND BARTLETT CORRECTION .................. 87 Hypotheses................................. 87 Likelihood Ratio T est................................................................ 89 Bartlett Correction..................... ■....................................................................... 92 Third and Fourth Order Moments............................................................. 92 Expectation of LRT Statistic ..................................................................... 98 Under a Restricted Ha ................................................................................. 100 Under an Unrestricted Ha .......................................... 100 Matrix Expressions for Z i and K i ................................................................... 101 5. SIMULATION..................................................................................................... 108 Simulation Study of Variable Reduction (Redundancy) Tests ............... 108 The Comparison Test................................................................................... 109 Variance Reduction Procedure................................................................... I l l Discussion..................................................................................................... 132 Simulation Study of Component Reduction T ests ........................................... 133 Numerical Examples.......................................................................................... 135 Example I. (continued) ............................................................................... 135 Example 2. (continued) ............................................................................... 136 6. CONCLUSION......................................................................... 138 REFERENCES................................................................................................. 140 APPENDICES........................................................................................................... 144 APPENDIX A - Notation................................................................................. 145 APPENDIX B - List of Matlab Subprograms............................................... 147 APPENDIX C - Programming Codes............................. ; .............................. 151 vi vii LIST OF TABLES Table Page 1. Eigenvalues of Covariance Matrix: Photographic Example.................... 31 2. Eigenvectors of Covariance Matrix: Photographic Example.................. 31 3. Eigenvalues of Covariance Matrix: Women’s Track Example................. 34 4. Eigenvectors of Covariance Matrix: Women’s Track Example............... 34 5. Probabilities of Type I error when (p, k, q) = (10,2,4) and A = 04,10,25)' ............................................................................................. 116 6. Probabilities of Type I error when (p, k, q) = (10,2,4) and A = (lg, 5,10)' 116 7. Probabilities of Type I error when (p, k, q) = (10,2,4) and A = (1'8, 2,10)' 117 8. Probabilities of Type I error when (p,k,q) = (10,2,4) and A = (4 .1 .5 .10) ' .......................................................................................... 117 9. Probabilities of Type I error when (p, k, q) — (10,2,1) and A = (4 .10 .25) ' .......................................................................................... 118 10. Probabilities of Type I error when (p, k, q) = (10,2,1) and A = (1'8, 5,10)' 118 11. Probabilities of Type I error when (p, k, q) = (10,2,1) and A = (18, 2,10)' 119 12. Probabilities of Type I error when (p, k, q) = (10,2,1) and A = (4 .1 .5 .10) ' .......................................................................................... 119 13. Probabilities of Type I error when (p,k,q) — (10,4,2) and A = (4.10.10.10.25) ' ................................................................................ 120 14. Probabilities of Type I error when (p,k,q) = (10,4,2) and A = (4 .5 .5 .10 .10) ' .................................................................................... 120 15. Probabilities of Type I error when (p,k,q) = (10,4,2) and A = (4 .2 .5 .10 .10) ' 121 16. Probabilities of Type I error when (p,k,q) = (10,4,2) and A = % 1.5,5,10,10)'.................................................................................... 121 17. Probabilities of Type I error when (p,k,q) = (10,4,5) and A = (1^,10,10,10,25)'................................................................................... 122 18. Probabilities of Type I error when (p,k,q) — (10,4,5) and A = % 5,5,10,10)'......................................... 122 19. Probabilities of Type I error when (p, k, q) = (10,4, 5) and A = (1^,2,5,10,10)'....................................................................................... 123 20. Probabilities of Type I error when (p, k, q) = (10,4,5) and A = (1^,1.5,5,10,10)'.................................................................................... 123 21. Probabilities of Type I error when (p,k,q) = (15,2,1) and A = % , 15,35)'............................................................................................. 124 22. Probabilities of Type I error when (p,k,q) = (15,2,1) and A = (1L;,10,15)'............................................................................................. 124 23. Probabilities of Type I error when (p, k, q) = (15,2,1) and A = 2,15)'............................................................................................... 125 24. Probabilities of Type I error when (p, k, q) — (15,2,1) and A = (T13, 1.5,15)'....................................................... 125 25. Probabilities of Type I error when (p,k,q) = (15,4,2) and A = (T11, 15,15,15,35)'................................................................................. 126 26. Probabilities of Type I error when (p,k,q) = (15,4,2) and A = (T11, 10,10,15,15)'................................................................................. 126 27. Probabilities of Type I error when (p,k,q) — (15,4,2) and A = (T11, 2,10,15,15)' .................................................................................. 127 28. Probabilities of Type I error when (p, k, q) = (15,4,2) and A = (Tn , 1.5,10,15,15)' ............................................................................... 127 29. Power of test when A = (1'8,10,25)', k = 2, q — 1 .................................. 128 30. Power of test when A = (T8, 2,10)', k = 2, q = l ................................... 129 viii 31. Power of test when A = (lg, 10,10,10,25)', k = A, q = 2 ........................ 130 32. Power of test when A = (lg, 2,5,10,10)', k = '4, q = 2 ......................... 131 33. Percentage coverage of Cl when (p, k) = (8,2) and A = (lg, 10,25)' .... 134 34. Percentage coverage of Cl when (p, k) = (8,2) and A = (1'6, 2,10)'........ 134 35. Percentage coverage of Cl when (p, k) = (8,4) and A = (I4 ,10,10,10,25)' 134 36. Percentage coverage of Cl when (p, k) = (8,4) and A = (I4 ,2,10,10,10)' 134 37. P-value of Testing Hypothesis: Women’s Track Example....................... 136 ix XLIST OF FIGURES Figure Page 1. Ideal Scree Graph......................................................................................... 6 2. A Profile Plot of the Expected Mean........................................................ 33 3. Power of test when A = (Iz8, 10,25)', k = 2, q — I .................................. 128 4. Power of test when A = (18, 2,10)', Jc = 2, q = I .................................... 129 5. Power of test when A = (1'6, 10,10,10,25)', k = 4, q = 2 ......................... 130 6. Power of test when A = (1'6, 2,5,10,10)', k = 4, q = 2 ........................... 131 ABSTRACT The main objective of this thesis is to develop procedures for making inferences about the eigenvalues and eigenvectors of a covariance matrix. Specifically, new pro­ cedures for examining dimension reduction in principal component analysis (PCA) are developed. The dimension reduction consists of the following two aspects: reduc­ tion in the number of components and reduction in the number of original variables. The procedures are based on a likelihood approach. Parameterizations of eigenval­ ues and eigenvectors are presented. The parameterizations allow arbitrary eigenvalue multiplicities. The use of the Fisher scoring algorithm for computing maximum like­ lihood estimates of the covariance parameters subject to multiplicity and other con­ straints is discussed. Asymptotic distributions of estimators of covariance parameters are derived under normality and non-normality. Likelihood ratio tests and Bartlett corrections are described. Simulation studies show the effectiveness of Bartlett cor­ rections. The new procedures are demonstrated to give better overall results than some existing methods. ICHAPTER I INTRODUCTION The main objective of this thesis is to develop procedures for making inferences about the eigenvalues and eigenvectors of a covariance matrix. Specifically, new pro­ cedures for examining dimension reduction in principal component analysis (PCA) are developed. The procedures are based on a likelihood approach. The first chapter of this thesis briefly describes principal component analysis followed by a discussion of dimensionality reduction methods. Two examples are given. In Chapter 2, parameterizations of eigenvalues and eigenvectors are presented. Two eigenvalue parameterizations are adopted under different objectives of use. The parameterizations allow arbitrary eigenvalue multiplicities. The Newton-Raphson algorithm is used to solve implicit parameter equations. The hypothesis test of re­ dundancy is described and an eigenvector parameterization under redundancy is pro­ posed. Chapter 3 describes the use of the Fisher scoring algorithm for computing max­ imum likelihood estimates of the covariance parameters subject to multiplicity and other constraints. Asymptotic distributions of estimators of covariance parameters are derived under normality and non-normality. In Chapter 4, likelihood ratio tests and Bartlett corrections are described. The results of simulations that examine the 2effectiveness of Bartlett corrections are presented in Chapter 5. Two numerical ex­ amples also are illustrated. Principal Component Analysis Investigators often measure or make observations on a large number of variables. There are several useful techniques to reduce the dimensionality of data without the loss of much information such as factor analysis and cluster analysis. Principal component analysis also is one such technique and is one of the most widely-used multivariate techniques. Typically, principal component analysis is used to reduce the dimensionality of a data set, while retaining as much of the original information as possible. This is achieved by transforming the original set of variables into a smaller set of linear combinations called principal components. These new variables are uncorrelated and ordered so that the first principal component accounts for the largest proportion of the variation present in the original set of variables. The usual objective of the analy­ sis is to determine whether the first few components account for most of the variation in the original data. If they do, then these components can be used to summarize the data with little loss of information. This dimension reduction can be useful in simplifying subsequent analyses. Before defining principal components, some notation will be described. Matrices will be denoted by boldface upper case letters. Vectors will be denoted by boldface 3lower case letters. A diagonal matrix having diagonal elements cti, a2, . . . , O71 is de­ noted by diag(ai, . , an) and the trace of matrix A is denoted by tr(A). The p x p identity matrix is denoted by I p. Suppose that the vector of original variables y = (yi y2 • • • Vp)' has a posi­ tive definite covariance matrix £ . For mathematical convenience and without loss of generality, assume that the mean of y* is zero for all i = 1 ,2 , . . . , p . Because S is symmetric and positive definite, all eigenvalues of 2 are real and positive. Let > A2 > • • • > Ap > O be the ordered eigenvalues of 2 and let F = (7X 72 • • • 7 ) be a p x p orthogonal matrix such that 2 = FAF', (1.1) where A — diag(Ai, A2, . . . , Ap). That is, 7i is an eigenvector of 2 corresponding to the eigenvalue A*. The principal components of y are the entries of the p-vector z, where z is the linear combination that can be written as % = T'y. The covariance matrix of z, 2 Z, can be written as (1.2) 2 Z = r'2F. Substituting 2 from (1.1) yields 2 Z = F7(FAFr)F = A. 4Hence the components z\ = 7 ^ , % = 1J12V, ---,Zp = 7 ^ are uncorrelated. The variance of Zi is Xi. The maximum value of variance of 7^7/ satisfying 7^7^ = I is equal to Al, the largest eigenvalue of E. This maximum occurs when 1J 1 is an eigenvector of S corresponding to Ai . That is, max I ' H l — 7 ,1E 7 1 = Al.i'i=i Then, the component Zi has the largest variance Ai , Z2 has the second largest variance X2, and so on. The total variance of the p principal components is equal to the total variance of the original variables so that y i A* = tr(E). i = l Consequently, the j th principal component accounts for a proportion t = tr(3) of the total variance of the original variables. This thesis focuses on two major aspects of dimensionality reduction: 1. Reduction in the number of components. 2. Reduction in the number of original variables. Reduction in the Number of Components The major problem in reducing the number of components is deciding how many principal components should be retained in order to account for most of the variability 5in the original data. Some rules of thumb have been suggested. Kaiser (1958) sug­ gested a cut off point to retain the components whose sample variances are greater than the average, i.e. greater than I for a correlation matrix. It can be argued that if one variable is nearly independent of all others variables, it will dominate a component with variance slightly less than I assuming that variables have been standardized. Since this variable provides independent information from the other variables, there is no reason to discard it. Kaiser’s rule is arbitrary and tends to re­ tain too few components (Jolliffe, 1986). Rencher (1995) suggested that investigators should retain sufficient components to account for a specified percentage of the total variance. Figures between 70 and 90 percent are generally suggested. Three methods are in common use for determining the number of components to be retained. The first method is graphical and is called a scree graph. This technique was discussed and named by Cattell (1966). The scree graph consists of a plot of the ordered eigenvalues against eigenvalue numbers. With this plot, the components to be retained correspond to the eigenvalues plotted in the steep curve above the straight line formed by the smaller eigenvalues. Thus in Figure I, the first two components would be retained. Jolliffe (1986) suggested plotting Iog(A) rather than A. This plot is known as a log-eigenvalue diagram. The second method is testing the hypothesis that the last m = p—k eigenvalues of S are equal. Anderson (1963) derived the likelihood ratio test for Ho&: Xk+i = • • • = Xp - X against the alternative that at least two of the last m eigenvalues are different. 6Figure I. Ideal Scree Graph. Eigenvalue Number Let yi,y2, ■ ■ ■ , V n be a random sample of size N = n+1 from a p-variate normal distri­ bution with mean n and covariance matrix S . If y can be written as Cw + e, where C is a p x k matrix with rank k, E(e) = 0, var(e) = a 2 I p, and var(m) = I k, then var(y) = H = C C '+ a 2 Ip. Write C C in diagonal form as C C = F 1A1Iv1 and let F2 be an orthogonal complement to F 1. That is, I p = F 1F1TF2Fz2 and (F1 F2) G O(p), where Oip) denotes group of orthogonal p x p matrices. The covariance matrix of y can be rewritten as S — F1A1F1 + (j2/p = F1A1F 1 + a-2 (F1F1 + F2F2) = F1(A1 T a 2)Fz1 TF 2^2F'. 7Therefore, the first k eigenvalues are Xi+a2, i = l , 2 , . . . , k , where A1 = (Iiag(Ai), i = 1,2 ,. . . ,k, and the last m = p — k eigenvalues are a2. Anderson derived the test for testing the equality of the last m eigenvalues. The test statistic is A& (1.3) where Ii , i = 1,2, . . . ,p, are eigenvalues of the sample covariance matrix, S. If Ho& is true, then the limiting distribution of - 2 In A k is where u = |(m + 2)(m - I). These results can be summarized as Ti m in Z — in Ii i = k + l where I = E L m k/m. Bartlett (1954) improved Anderson’s test by using a multiplying factor. Bartlett’s test is to reject Ho& whenever To — ( Tl — k —2m2 + m + 2 6m m In I — ^ ki=/c+l ■a.y’ where X i-a,w ^ e 100(1-0;) percentile of the %2 distribution. Lawley (1956) proposed a further improvement in Bartlett’s multiplying factor. Lawley’s test is to reject Ho& whenever 2m2 + m + 2 v-^ P n ~ k ------------- =------------6m i=X min Z — ^ 2 h i = k + l > X L .,,- Acceptance of Ho^ suggests that each of the last m components contain the same amount of information. If these eigenvalues are very small, then little information is 8lost by discarding the corresponding principal components. The first k components should be retained for further analysis. A sequence of tests can be conducted starting with k — Q and increasing k until the null hypothesis is accepted. Rentier and Yuan (1996) developed a method of testing the linear trend in the last m = p — k eigenvalues of the covariance matrix. The hypothesis of interest is H0m: Afc+j = Q; + fai, Xi = m — i, ? = 1,2, . . . , m. The likelihood ratio test statistic is Sup S 0) A _ _____________ " Sup ^ , S ) ' Ai1S where S 0 = F0 A0 T70 and A0 = diag(Ai,. . . , o; + /ktq, a + Px2, . . . , a + Pxm). For S 0 and S, £(//,£ ) is maximized when p, = y. To maximize £(y, S 0), Rentier and Yuan showed that Ai = Ii for i = 1,2, . . . , k are the maximum likelihood estimators (MLEs) of Al, . . . , A^ under H0m. Maximum likelihood estimates of a and j3 were numerically obtained by solving the following equations: m E [n lk+j — (n + 1)(q! + /3xj)] (a + P Xj)2 m E [n lk+j - (n + l)(o! + Pxj)]xj {a + $Xj)2 0, 0. The asymptotic distribution of —2 In Am is with v = m — 2. If all Xi are equal to zero, then the hypothesis H0m is equivalent to Ho& as considered by Anderson (1963). Roik (2002a) proposed a method for modeling the eigen-structure of several co­ variance matrices simultaneously. The proposed model for the ith covariance matrix 9is as follows: S i = HTiAiIT', i = 1,2 , . . . , p, where the eigenvector matrices (ITi) and eigenvalue matrices (A i) may share certain properties. The model extends common principal components (Flury, 1988) and subsumes Anderson’s (1963) model, Lawley’s (1956) model and Rentier and Yuan’s (1996) model as special cases when g—1. Boik presented an algorithm to compute maximum likelihood estimates of covariance parameters and also gave likelihood ratio tests including Bartlett corrections. Several of his results are referred to throughout this thesis. The third method is based on the cumulative proportion of total variance. If Ai > A2 > • • • > Ap are the ordered eigenvalues, then the cumulative proportions of the first k eigenvalues, and the last m = p — k eigenvalues, 5m, are as follows: Let 4 = p=1 ( and dm = ^ ~ fe+17 \ where Ii , i = 1,2, . . . ,p, are defined in (1.3). Anderson (1963) proposed a test statistic for the hypothesis, (1.4) where Smfi is a known constant. If Sm is small, then little information is lost by discarding the corresponding principal components. Anderson showed that if Xk > 10 /W i, then Sm is asymptotically distributed as &m) N / 0i 2 ^ E L A ? + 2 ( l - ^ ) 2E L t+1 A? ( E L 1A)= (1.6) The test rejects H0m whenever |z* | > za/2, where z* = ^ 7 = 5 , (1.6) ^JV(Sm) za /2 is the upper lOOa/2 percentage point of the standard normal distribution, and V(Sm) is an estimator of the variance of Sm which can be computed by replacing Xi by Ii in the variance of Sm in (1.5). Sugiyama and Tong (1976) studied an approximate distribution of the cumulative proportion of the first k eigenvalues, <5*. Sugiyama discussed that this quantity is interpreted as a measure of the amount of information in the k retained components. Assuming that the eigenvalues are distinct, he derived a perturbation expansion for the distribution of Sje that is accurate to 0 (n -3/'2). Huang and Tseng (1992) devised the following method of selecting the number of components to be retained. Let J2i=i \ / J2i=i = $k and J"i to), where w i s a fixed constant and q is an integer less than p. Their decision rule is to select k components, where k is the smallest integer that satisfies 4 > c, 11 where c is a fixed constant to be determined. They discussed how to determine N and c such that the minimum probability of retaining the important components is at least some specified level P*, 0 < P* < I. That is, min I inf Px(6k > c ) l > P*. (1.7) Huang and Tseng showed that if w > 0.5, k < | , and N and c satisfy c < w{l - (I - u)\ j2p/(p — I)(AT — 1)$“1(P*)}, where (x) is the c.d.f of the standard normal distribution, then (1.7) holds. In this thesis, a likelihood approach for making inferences about the cumulative proportion of total variance is given. The hypothesis of interest is Cv \ Ho : t e p ) =Co' (L8) where Ci is a matrix of known constants and C0 is a vector of known constants. This hypothesis is more flexible and general than the hypothesis in (1.4). If Ci is (O7fe Ip-*,)' and c0 is a scalar, then the hypothesis in (1.4) is subsumed as a special case of the hypothesis in (1.8). Specific goals are to obtain maximum likelihood es­ timates (MLEs) of covariance parameters, to construct a likelihood ratio test of the hypothesis, and to construct confidence intervals for The MLEs of parameters and a likelihood ratio test for the hypothesis in (1.8) are given using one of the param- eterizations of eigenvalues presented in Chapter 2. The parameterizations allow for arbitrary eigenvalues multiplicities. A Bartlett correction is derived to improve the 12 likelihood ratio test. The tests are inverted to obtained confidence intervals. The like­ lihood ratio intervals and Bartlett-corrected intervals are compared with Anderson’s intervals inverted from (1.6) by means of simulations. The results of the simulations are described in Chapter 5. Reduction in the Number of Original Variables A principal component is a linear combination of all of the original variables. Thus, interpretation and subsequent data analysis involve all of the variables even if some components are discarded. Not only is it useful to reduce the number of components but it also is useful to reduce the number of the original variables. If the first k principal components have zero coefficients on a set of original variables, then those variables are redundant. These redundant variables can be eliminated from the study and need not be collected in any subsequent studies. The term “redundancy” has been employed in other contexts , e.g. multivariate regression (Lazraq, 2001), discriminant analysis (Pujikoshi, 1985), canonical correlation (Van Den Wollenberg, 1977; Gleason, 1976), and growth curve analysis (Pujikoshi and Rao, 1991). The term takes a different meaning in each of these contexts. In this thesis, the principal components of y are defined as in (1.2). It follows that y = Pz. The coefficient for obtaining the ith variable from the j th principal component is 7^ and is called a loading. A redundant variable is defined to be an original variable 13 whose loadings on the first k principal components are zero. Two approaches for determining the number of variables and which variables to retain will be discussed, namely the rejection approach and the hypothesis testing approach. A rejection approach is a set of cut-off rules for choosing how many and which variables to be retained. A hypothesis test is a method of inference to decide which of two complementary hypotheses is true. The rejection approach will be described first. Beale, Kendall and Mann (1967) developed cut-off rules that specify the number of variables to be discarded. However, they focused primarily on multiple regression analysis rather than on principal component analysis. In multiple regression with p independent variables, they suggested selecting the set of r variables that maximizes the multiple correlation of the dependent variable with the r independent variables. An extension of this rule to principal components is to retain set of r variables which maximizes the minimum multiple correlation between r selected variables and any of p — r discarded variables. They also mentioned another rejection approach for use in principal component analysis. With all p variables, p principal components are obtained. If pi eigenvalues are smaller than some number, A0, then the last Pi components are inspected. Starting with the one corresponding to the smallest eigenvalue, then the next component corresponding to the second smallest eigenvalue and so forth, the variable that has the largest coefficient in the component and has not already been deleted by a previously considered component is deleted and the 14 number of variables is reduced fromp to p—P1. Another principal component analysis is done on the remaining p — pi variables. Similarly, if p2 eigenvalues are smaller than the A0, P2 variables associated with the largest coefficients in the last p2 components are rejected. The process continues until all eigenvalues are greater than the A0. The number of variables is reduced from p to r = (p - Pi - • • • — p*). The value of r is determined by choice of A0. Jolliffe (1972) discussed the appropriate A0 based on simulation studies. Jolliffe (1972) discussed eight rejection methods for deciding which variables to be discarded. The eight rejection methods were divided into three groups: namely . multiple correlation methods (Al and A2), principal component methods (BI, B2, B3, and B4), and cluster methods (Cl and C2). These methods are briefly described below. o Multiple Correlation Methods — Method Al is the method of Beale, Kendall and Mann (1967). — Method A2 is a stepwise method. In multiple regression with p independent variables, a subset of independent variables is selected , in each step, such that the variable having maximum multiple correlation with the remaining variables is deleted. The process is repeated until r variables are retained. o Principal Component Methods — Method BI also is based on Beale, Kendall and Mann (1967) using PCA. 15 — Method B2 is the same as Method BI except that only one principal com­ ponent analysis is done. If k variables are to be retained, then the last p—k components are inspected. For each of these p — k components, starting with the last component, the variable that has the largest coefficient in the component and has not already been deleted by a previously considered component is deleted. — Method B3 uses the last p — k components. The sum of squares of coeffi­ cients of each of the p variables in the last p — k components are computed. The p — k variables that have the largest sum of squares are rejected. - Method B4 uses the first k components. For each of k components, starting with the first component, the variable that has the largest coefficient in the component and has not already been selected by a previously considered component is selected. Variables that are not selected are rejected. o Cluster Methods - Method Cl is a single-linkage method (Seber, 1984). A measure of sim­ ilarity between two clusters of variables X and Y is defined by rxy such that Txy = max Vi j , (1.9) where is the correlation coefficient between variable i and j . For each of p(p - l ) /2 pairs of p variables,,rxy are computed. Two clusters having . the maximum rxy are combined into a single cluster. For the new set of 16 clusters, the process returns to calculate rxy and so forth. The process continues until k clusters of variables remain. The principal component analysis is based on k variables, one chosen from each cluster. Jolliffe described 3 ways to select a variable from each cluster. - Method C2 is an average-linkage method (Seber, 1984). It follows the same steps as Method Cl except that rxy in (1.9) is replaced with _ YUex ^ j e Y rij where pi ,p2 are the numbers of variables in X and Y, respectively. These rejection methods were tested on artificial data. The artificial data were con­ structed so that certain variables were linear combination of other variables. For example, if P i = P 2 + e, where e is a random disturbance, then either pi or p2 may be discarded without loss of information. It was shown that several rejection methods eliminated precisely those certain variables. In general, Method A2 is the most con­ sistent method because it only retained good or best subsets. Method B4 retained best subsets more often than Method A2 but it retained moderate or bad sets fairly frequently. JollifFe (1973) discussed the five methods, Al, B2, B4, Cl, and C2 which had been successfully used on artificial data in Jolliffe (1972). He applied these methods on four sets of real data. The methods were equally effective with real and artificial data, although none of rejection methods was apparently better than the others. McCabe (1984) proposed methods to select a subset of variables called principal 17 variables that contains as much information as possible. McCabe gave four optimality criteria to evaluate all possible subsets. These criteria are based on the conditional covariance matrix of variables not selected, given those selected. Let S be partitioned as y> _ f ^ l l ^12 \ S21 S 22 where S ii is the covariance matrix of the selected variables and S 22 is the covariance matrix of the discarded variables. The conditional covariance matrix of the variables not selected given those selected can be written as 5122.i — S 22 — S 2i S 111S i2.1 Zj I Zj1 Let 9i, i = l , 2 , . . . , p — k r be the eigenvalues of S 224 and Ai, « = 1,2, . . . ,p, be the eigenvalues of S. McCabe proposed the following four criteria: I. min IS224I = m inH L f $i, 2. min Lr(S224) = min Y h=i & 3. min ||S 224||2 = min 4. max Pi = max Ir(S 111S 12S 221S 21), where pi are canonical correlations between the variables not selected and those se­ lected. The percentage of variance explained by a set of k principal variables is ' - ' - S S 100% . 18 This percentage can be compared to the variation explained by the first few principal components. Jolliffe and Cadima (1995) showed that discarding variables that have a small loading is not appropriate in various respects. They examined the effects on a single component if one or more variables are eliminated. They did not examine the effects on a subspace spanned by several components simultaneously. Jolliffe and Cadima implied that the correlation between the ith variable and the j th principal component, Pij, is an appropriate quantity to examine. This quantity can be written as P ij = l a U ' 1/2 where -%• is loading for the ith variable in the j th component, Ij is the eigenvalue associated with that component, and s? is the variance of the ith variable. They used an example to show that similar coefficients, -%■, even very large ones, may translate into very different correlations between those variables and principal components or very different coefficients may be associated with similarly correlated variables and principal components. Jolliffe and Cadima examined the distance between the prin­ cipal component and the truncated principal component. The truncated principal component is the principal component in which variables with small loading are ig­ nored. The distance between the principal component and the truncated principal component can be expressed as \ \x% - I 11X7,11 = I - 27, 7 , + , 7 j 'S t ^ V/3 5 19 where 7* is the sub-vector of 7^ which results from retaining only the k coefficients associated with the variables that were retained, Sk is the sample covariance of the k selected variables, and Xk is the sub-matrix of a n x p column-centered data matrix, X , obtained by retaining only k of its columns. The last term of d shows that select­ ing the k variables with the largest loadings is not guaranteed to yield the smallest distance. The second approach for determining the number of variables is hypothesis test­ ing. Anderson (1963) gave the asymptotic distribution of the sample eigenvalues and eigenvectors when the population eigenvalues are equal in sets. Anderson considered the hypothesis Ho: 7x = 7 0, where 7% is the eigenvector of the first principal com­ ponent and 7 0 is a specified vector such that 7o70 = I. If the hypothesis is true, then rc[ii7o'Sf_17o + ^r1To-S1To ~ 2] has the limiting Chi-square distribution with degree of freedom p — I, where Z1 is the first eigenvalue of the sample covariance matrix, S. Similar results can be applied for any other eigenvector corresponding to an eigenvalue with multiplicity I. Tyler (1981) modified and extended Anderson’s results. Tyler derived asymptotic procedures for testing more general hypotheses concerning eigenvectors. Let H be a p x p matrix and W be a p x p positive definite symmetric matrix such that W H is symmetric. These conditions ensure that the eigenvalues and eigenvectors of H are real. Let Ai > A2 > • • • > Ap be the ordered eigenvalues of H and let A be a 20 p x g matrix with rank q. Lastly, let w be a subset of m integers (I < m < p) from {1,2, . . . ,p} such that Aj ^ Xj for all i G w, j ^ w. Tyler considered the following two hypotheses: for q m , Hq : The eigenvectors of H associated with the m eigenvalues, Aj, i G w, lie in the subspace generated by the columns of A. Let f31} f32, ■ ■ ■, Ppbe the eigenvectors of H and let bi, b2, . . . , bp be the eigenvectors of H n, where V^vec(Hra - H ) A N(0, S). Denote the p x m matrix whose columns are Pi for i £ w by Pw and denote the p x m matrix whose columns are bj iov i £ w by Btu. Let di > d2 >■■■> dp be the ordered eigenvalues of H ra, P 0- = P ^ P 1wPwY 1P1w, and P 0 = E ieiy A = Bto(BJ0Bw) -1BJu. Then the test statistic for H0 is Tra(A) = n {vec [(Jp - P 0)A ]}' [s(A )] vec [(Jp - P 0)A] , (1.10) 21 where S(A) = (A'& fp)c;,s Cw(AeJp), Cfra = 53 2 3 K ~ A j Y 1P i ® p ' j - itzW j f w Tyler showed that, under H0, Tra(A) X%>-m)q- The test is to reject H0 if r = rank(P 0 A) < $ or if r = g and Tra(A) > Xi-a,(p-m)q- The hypothesis Hg can be tested by using the following approach. Let B be a fixed p x ( p - q ) matrix with rank p — q whose columns are orthogonal to A, i.e. A 1B — 0. The hypothesis Hg can be rephrased as Ho : The columns of B lie in the subspace generated by the set of eigenvectors of H 1 associated with p — m eigenvalues, A%, i ^ w, and can be treated as H0. The test statistic can be simplified if a random sample has been drawn from an elliptical distribution. Denote the covariance matrix by H. Let k be an estimator of k, the kurtosis parameter. An estimator of the covariance matrix of Vec(JJra) can be written as S = (I + k)(Ip2 + J(p,p))(JJra O H n) + it Vec(JJra)[Vec(JJra)]'. The test statistic in (1.10) can be simplified as Tra(A) = n(l + k) Y i Aj1 t i [ A 1P iA l A 1 B wDjB 1wA ] ' 1} , itw (i.n ) 22 where Dj is an m x m diagonal matrix with entries di/(di — dj)2, j w. For g = m = I, Tn(A) is asymptotically equivalent to the statistic given by Anderson (1963). Flury (1986) discussed the test for redundancy of variables in the comparison of two covariance matrices. Let S 1 and S 2 be the covariance matrices of two indepen­ dent p-vectors. Flury analyzed the eigenvectors of S ^ 1S 2. Let {(3j}Pj=1 be a set of eigenvectors of S ^ 1S 2 and partition {/3J}^ =1 into the first p — q and the last q rows as follows: W P _ J=I ~~ 0 il/3j2 P J=I The hypothesis of interest is H0(p, q) : Pj2 = 0 for all j G w. (1.12) This hypothesis can be formulated in the form Hg of Tyler (1981), by putting A = ( I r- , 0 ')'. Then by using S = ( J9 ) ’ (L13) the hypothesis can be rephrased as H0 (p, q) : The columns of B lie in the subspace generated by the eigenvector a ;, (i ^ w) of S 2S ^ . 23 The test statistic in (1.10) can be applied. Let {bj}^=1 be a set of eigenvectors of S i 1S 2 and partition {bj}^=1 into the first p — q and the last q rows as follows: {M t= i = {- J bHbJ 2 J=I The test statistic is -R(P, q) kidjj T k2didj (di - dj)2 b^b i2 - i b J 2) where = limnj_>O0 for i = 1,2 and dj, j = 1 ,2 ,... ,p, are the eigenvalues of S 2S 11. Under H0(p, q), R(p,q) Xhg- Fujikoshi (1989) reviewed the problem of testing the hypothesis of redundancy in various multivariate situations including principal component analysis. Let {7^ be a set of eigenvectors of E and partition {7-/}j=1 into the first p - q and the last q rows as follow: p The hypothesis of redundancy of the last q variables in the first k components can be written as Ho (A;, q) : 7 ^ = 0 for all j 6 w, w = {1 ,2 ,.. . , k}, which is equivalent to H0 (A:, q) : The columns of B lie in the subspace generated by the eigenvectors Ti) (* ^ w) of E, (1-14) where B is defined in (1.13). Fujikoshi used the test statistic in (1.11) to test the hypothesis. Let k and 7^ i = 1 ,2, . . . ,p, be the eigenvalues and eigenvectors of S, 24 respectively. The test statistic can be rewritten as Tn{B) = n ^ (l l l S) where under Ho(A,g), Tn(B) -A Schott (1991) also employed the test statistic in (1.11) to test the hypothesis that in each of the first k components have zero loadings on the last q original variables. This hypothesis is equivalent to (1.14). The test is to reject H0(A:,q) if r — ran k (rr 'B ) < g, or if r = g and T*,, = > x L „ ,ia, (1.16) j£w where B is defined in (1.13) and Dj is an m x m diagonal matrix with entries h/ (k—lj)2, j ^ w, w = { 1 , 2 , , k}. The statistic T^q in (1.16) is a matrix expression for Tn(B) in (1.15). A Bartlett adjustment was proposed based on the idea that if a test statistic T has a mean which can be expressed as E(T) = o { l + ^ + 0 (n -3/2)} , (1.17) where a is the asymptotic mean and c is a constant, then the mean of adjusted statistic, T * = ( l - T, (1.18) approaches a. It follows from Tkiq that the asymptotic mean of Tkiq in (1.16) is a = kq. An expression for c can be found in Schott (1991). Schott compared the unadjusted statistic, Tkiq, and adjusted statistic, T*, under several conditions using 25 simulated data. He showed that, in general, the Bartlett adjusted statistic performs better than Tktq with respect to Type I error. Diimbgen (1995) proposed a likelihood ratio test for principal components of a matrix S . Let f2 be the set of all symmetric matrices in Rpxp and 0{p) be the set of all orthogonal matrices in Rpxp. For S G fi, let F G 0{p) such that F 'S F = A, where A = diag(Ai, A2, . . . , Ap) and Ai > A2 > • • • > Ap. Let m = (mi, m2, . . . , ma) be a partition of {1,2,... ,p} into o > I sets. The hypothesis of interest is Ho: S = FAF', where F = ®-L1Fii, A = GiL1Aii, Aii = diag(Ai) for i G mi, Fii is an Tni X m i matrix, Fii G Oimi), 5j)“=1 Tni = p, and ® stands for the direct sum. Let h > k > • • • > Ip be the eigenvalues of S and define /x = arg min \\u - Am(5)||2,z/6X(n) where \{ S ) is the vector of the ordered eigenvalues of S , and Am(S) = ^ A(Si^ )iJgmi ^ A(Siy)iJgma X A(Sy)iJgms J The exact computation of is described in Diimbgen (1995). The likelihood ratio test is to reject H0 whenever tk(S) < c{a), where tkiS) - Y M ^ ) . i=l 26 The test statistic requires a simulation to obtain critical values. In this thesis, a likelihood approach for making inferences about the covariance pa­ rameters is proposed and is applied to redundancy test. Let A = diag(Ai, A2, . . . , Ap) be the diagonal matrix of the eigenvalues of S and T* be a set of the corresponding eigenvectors. The subspace spanned by the columns of a matrix, T , is denoted by TZ(T). Let A be a known p x q semi-orthogonal matrix with rank q, and M* be a p x m semi-orthogonal matrix such that TZ(M*) spans the subspace generated by m specific columns of I v. The hypotheses of interest are Ho : A G Tl(TstM ie) for q m. (1.20) It follows from (1.20), see Tyler (1981), that r*M* G Tl(A) <=> A=E Tl(T*M*c), where Ac and.M"*6 are orthogonal complement to A and Af*, respectively. That is, A cA cl = Ip - A A and M*c M*0' — Ip Accordingly, the hypothesis Hg in (1.20) can be rewritten and treated as the hypothesis H0 in (1.19). Only hypothesis Hg in (1.19) is employed in the remainder of this thesis. To match the theoretical setup in the later chapters and without loss of generality, M* can always be equated to the first m columns of Ip. This simplification is possible because the columns of T* and the rows of AF can be permuted to satisfy Am oTl(M*) = Tl 27 If H{M*) then replace T*M* by TM , where T = T*P and M = P 1M*, where P is a p x p permutation matrix. Choose P so that K(P 'M *) I . In the remainder of this thesis, hypothesis (1.19) will be written as H0: A g n {TM ), where H(M) = n ( (1.21) This hypothesis is general and the hypothesis of redundancy is subsumed as a special case. In Chapter 3, an algorithm for computing MLEs of the covariance parameters satisfying multiplicity and other constraints is proposed under H0 in (1.21). A likeli­ hood ratio test and a Bartlett correction for the test of (1.21) are also developed in Chapter 4. The hypothesis of redundancy can be written as (1.21) in the following manner. First, the eigenvalues will be reordered so that Ai < A2 < • • • < Ap. This order is used throughout the remainder of the thesis. The rational for reordering the eigenvalues is described in Chapter 2. Let F be a matrix whose columns are the corresponding eigenvectors. Then, by equating A- to A = (Iq O1)1, the hypothesis in (1.21) matches with the hypothesis in (1.14). This hypothesis states that each of the first k compo­ nents has zero loadings on the last q original variables. The redundancy hypothesis is identical to that tested by Fujikoshi (1989) and Schott (1991) but it differs from the variable selection approaches of Jolliffe (1972) and McCabe (1984). To understand 28 the distinction, first note that it follows from (1.2) that y = Tz. Partition y as where y i is (p — q) x I and y2 is q x I. Partition z as z = Zi Z2 J where Zi is k x I and Z2 is (p — k) x I. The last q variables are redundant if their loadings on the first k principal components are zero. That is, the last q variables are redundant if r K rO1 r : ) ■ P-22) It follows from (1.2) that var y Z = var Ipr' y = TAT1 PA \ Ar' a J ’ where T is defined in (1.22). The conditional covariance of y2 given Z2 can be written as var(y2N ) = var(y2) - cov(r/2, ^ jv a r (Z 2)) 1Cov(Z2l^2)- If the last q variables are redundant, then var(t/2|z2) = F22A2Pg2 — P22A2A21A2P22 = 0. That is, if y2 is redundant with respect to Zi then var(y2|z2) = 0. On the other hand, Jolliife (1972) and McCabe (1984) reduced the number of 29 variables in such a way that the variables can be discarded if all linear functions of y can be approximately reproduced as linear combination of . It can be shown that satisfaction of the constraint (1.22) does not guarantee that any of McCabe criteria (see page 17) are optimized. For example, if F satisfies (1.22), then the conditional covariance of j/2 given can be written as 322.1 = (F% ,A 2% ) \ which can have norm as large as ( r 22A^1Fz22) = tr(A2) = (p — &)A2 if all eigenvalues in A2 are equal to A2. Conversely, ||X22.i||2 can be small even though redundancy is not satisfied. For example, suppose that y = ^ y + e y ’ w^ere var(yi) = Xn , var(e) = aXn , and a is a scalar. The covariance matrix of y can be written as X = <8> Xn .I l + o Let F n be the matrix of eigenvectors of Xn . Then, the eigenvector matrix of X, F, can be expressed as F = Z X l + \ A + f I - V L T f ) The conditional covariance of y2 given y i can be written as ® rn . X22.i — (I + o)Xn — Xn X111Xn — oXn . McCabe criteria can be satisfied by choosing o to be a small number. If a is close to zero, then F approaches r - i 30 which does not satisfy the redundancy constraint in (1.22). Accordingly, the criteria of Jolliffe and McCabe were not constructed to reduce the number of variables with respect to redundancy. The likelihood ratio test for redundancy is compared with Schott’s tests in (1.16) and (1.18) using simulated data. A simulation study of the effectiveness of the Bartlett correction is described in Chapter 5. To illustrate the dimensionality reduction problem, two examples are described below. Example I A large number of variables were observed in a study of the quality of pictures produced by a photographic process. The data were originally presented by Jackson and Morris (1957) and were discussed by Schott (1991). The procedure for a check on the process was as follows: A film strip was given a graded series of exposures to white light and was processed. Optical densities were measured through red, green, and blue filters at the high-density portion of the characteristic curve (shadow areas), at the middle-tone portion of the curve (average picture density) and at the toe portion of the curve (highlights, whites, etc.). There were p = 9 measurements: three density levels and three colors at each level. The sample covariance matrix based on N = 109 was given in Jackson and Morris (1957). The eigenvalues and cumulative proportions of total variance are given in Table I. The eigenvectors of the first two 31 Table I. Eigenvalues of Covariance Matrix: Photographic Example. Component Eigenvalue Cumulative proportion of total variance I 878.52 0.6092 2 196.10 0.7452 3 128.64 0.8344 4 103.43 0.9062 5 81.26 0.9625 6 37.85 0.9888 7 6.98 0.9936 8 5.71 0.9976 9 3.52 1.0000 Table 2. Eigenvectors of the First Two Principal Components. Eigenvector Component High density Middle tone Toe portion red green blue red green blue red green blue I 0.305 0.653 0.483 0.261 0.324 0.271 0.002 0.006 0.014 2 -0.486 -0.151 0.588 -0.491 -0.038 0.373 0.057 0.054 0.088 principal components are presented in Table 2. The information in these tables leads to the following two questions: 1. How many components should be retained? 2. How many and which variables can be considered as redundant? Consider the first two principal components. These components account for .75% of the total sample variance. Table 2 reveals that first two components have small coefficients on the last three variables. The question is whether these three variables can be discarded. These two questions have been addressed in several papers referenced 32 earlier. This thesis proposes an alternative method for answering these questions. This method is based on a likelihood approach and is discussed in Chapters 2-4. Example 2 The 1984 Olympic track records dataset for women was examined. The data were first analyzed using PCA by Dawkins (1989). There were p = 7 track events; namely, 100 meters, 200 meters, 400 meters, 800 meters, 1500 meters, 3000 meters, and marathon (42,195 meters). There were 55 countries having complete records for all events. The data consist of time to complete each event by athlete from each country. Dawkins normalized the data to have mean 0 and standard deviation I and then analyzed it. Naik and Khattree (1996) discussed an alternative choice to com­ pare the athletic performance of nations. Instead of using time as the response, speed defined as distance per unit time was used. Speeds for the women track events and coefficients in the first two principal components are presented in Naik and Khattree (1996). In the present context, suppose that it is believed that variability in speed over track events can be summarized in terms of linear, quadratic, cubic, and higher order functions. A profile plot of the expected mean speed against log(distance) ap­ pears in Figure 2. It is of interest to examine the polynomial components of the response profile. In this example, the matrix C* containing the coefficients of orthog­ onal polynomials is 33 Figure 2. A Profile Plot of the Expected Mean. 9 T 8.5 ' 8 - 7.5 - s 7" 6.5 - 6 - 5.5 ■ 5 ■ 4.5 - 4F- 2.5 3 3.5 log(distance) 4.5 5 f -0.4742 0.5426 -0.4769 0.2996 -0.1380 0.0402 \ -0.3332 0.1609 0.2878 - -0.5738 0.5053 -0.2298 -0.1921 -0.1229 0.4780 - -0.1101 -0.5214 0.5410 -0.0511 -0.3087 0.2700 0.4374 -0.1839 -0.6791 0.0768 -0.3924 -0.1152 0.3879 0.5930 0.4264 0.2179 -0.3914 -0.5938 --0.4752 -0.2612 -0.0995 ^ 0.7558 0.5120 0.1502 0.0341 0.0062 0.0008 ) The first column of C* contains coefficients of the linear contrast. The second column of C* contains coefficients of the quadratic contrast etc. The matrix C* can be computed by using the Matlab program given in Appendix C. The eigenvalues of C*'SC* and cumulative proportions of total variance are given in Table 3. The eigenvectors of the first two principal components are presented in Table 4. Two questions of interest are addressed. First, how many principal components should be retained? The first component accounts for 55%. The first two components account for 81%. Second, which polynomials contribute to the largest 34 Table 3. Eigenvalues of Covariance Matrix: Women’s Track Example. Component Eigenvalue Cumulative proportion of total variance I 0.1246 0.5509 2 0.0586 0.8102 3 0.0240 0.9164 4 0.0091 0.9566 5 0.0061 0.9836 6 0.0037 1.0000 Table 4. Eigenvectors of the First Two Principal Components. Component Eigenvector linear quadratic cubic quartic quintic septic I -0.9784 0.0065 0.1738 -0.0764 -0.0783 -0.0229 2 0.0316 0.8985 0.0666 -0.3833 0.2005 0.0059 principal component. The data will be analyzed using the approach in Chapters 2-4. The conclusion of this example will be discussed in Chapter 5. 35 CHAPTER 2 PARAMETERIZATION In this chapter, two parameterizations of eigenvalues and one parameterization of eigenvectors are presented. Expressions for the first three derivatives of each of the parameterizations are given in Theorems 1-8. These theorems are proven and the results are employed in Chapter 3 to estimate parameters. To make the theorems distinct from the text, theorems and proofs will be set in italic letters. The notation and terminology used here closely parallel that of Boik (2002a, 2002b). Let S be th ep x p covariance matrix with eigenvalues Ai , . . . , Ap and suppose that 7 1, . . . , 7p is a set of orthonormal eigenvectors corresponding to these eigenvalues. Then, the spectral model for the covariance matrix is as follows: where A — diag(Ai, . . . ,Ap) and T = (7i 7% - - 7P)- Let pt and cp be vectors of unknown parameters. The quantities, T and A, are parameterized as functions of /i and (p such that T = P (/Li) and A = A(<£>). The parameters are arranged in a vector, 0: Spectral Model S = TAT (2 .1) 36 where dim(-) stands for the dimension of vector and = za Parameterization of Eigenvalues Denote the p x I vector of eigenvalues of S by A. Then, p p j = i j=i where e? is the j th column of I p. Taking the vec of both sides yields vec(A) = ® Bpj )Xj j ' = i = B e ? ® 4 ) < a . j=i Accordingly, the relationship between A and A is vec (A) = L \ and A = ^vec(A),- (2.2) where p i: = ]T (e ; ® e^)^' and ^ = Jp. (2.3) j ' = i Two parameterizations of A are used in two applications of the dimension reduc­ tion in principal component analysis. These two structures for A are described in each of the applications below. 37 Reduction in the Number of Variables It is assumed that A is a differentiable function of cp, where y is a zv2 x I vec­ tor of unknown parameters. Boik (2002a) introduced several possible structures for eigenvalue parameterization. The structure used here is as follows: X = T1 exp{©(T2y)}, (2.4) where T1 and T2 are full column-rank eigenvalue design matrices with dimensions PXq2 and q2 x v2, respectively and © indicates that the operation is to be performed element-wise. For example, if it is a p x I vector, then exp (©it) = (eUl e“2 • • • eUp)'. The following examples show how to construct T1 and T2. Suppose that A contains u2 distinct values. Let m = (To 1 m2 • • • Tnv2)' be the vec­ tor of eigenvalue multiplicities, where Y^iLi 77L = P- For example, if A = (10 10 5 I I)' then m = (2 I 2)'. The parameterization in (2.4) allows arbitrary eigenvalue multi­ plicities. Consider the p x I vector of A with multiplicity m and ( A1 0 > ( Im1 S x p (^1) o XA2 " , -Tm* expfpg) V 0 Ap j V 0 Imv2 SXp(P1Z2) y If the distinct eigenvalues are unordered, then T1 and T2 can be written as "2 T1 = @ Imi and T2 = I U2. Z=I 38 To order all the eigenvalues from small to large when m = I v and p = 4 = z/2; T1 and T2 can be written as Ti ( I I I V i 0 1 i i o X o o i / and T2 = I u (2.5) To order all the eigenvalues from small to large when m = (I I 2)', p — 4, and z/2 3; T1 and T2 can be written as / I 0 0 \ I I 0Ti = and T2 = I.Z/g'I I I V i i i To ensure that the first eigenvalue is the largest when m set T1 and T2 as I p and p = 4 = z/2; Z I 0 0 o X I - I 0 0 I 0 - I 0 V 0 0 - I ) and T2 — I U2 • To order all the eigenvalues such that Ai = exp{a + /3(p - %)}, % = 1,2, where a and /3 are constants; T1 and T2 can be constructed as ,P, T1 = Jp and T2 = ( lp p), where p is a vector with elements p — i, i = 1 ,2, . . . ,p. To order all eigenvalues in Example I (the study of the quality of pictures) so that the last eigenvalue is the largest, the second last eigenvalue is the second largest, and the first seven eigenvalues have no order but are smaller than the last two; T1 and T2 can be constructed as follows: / i? o7 - J 7 A T1 = I I 0 O7 I and T2 = Jg. V i i o' y 39 Expressions for the first, second, and third derivatives of A in (2.4) with respect to y/ are given in Theorem I. These results are useful for estimating parameters and for constructing likelihood ratio tests including Bartlett corrections as explained in Chapters 3-4. Theorem I (Boik 2002a, in supplem ent). The first three derivatives of A in (2-4) with respect to ip' can be written as follows: D y ' = = T1 d i ag [exp { G (T, J'} ] T2. D f = = {exp(0T2y) ® T1Y W<2>(T2 8 T2), dtp1 ® dp ' ^A dp ' ® dp' ® dp' {exp(02%)}] = e f exp{ef'T2^ j e f '. where 40 Taking the derivative of (2.6) yields 92 n D i? = T i £ > f — [exp{e?'T2V}ef'T2] 2=1 * Q2 = T1 ^ ef={ exp{ef='r2y}ef-'T2( /„ ® ep'Tz)}. 2=1 Expressing exp{ef2,T2y>} as e |2,exp{©(T2^j)} yz'e/flLs 92 U f = ^ T 1E f e f exp{©(T2¥.)} (e f '® e f ) (T 2® T2) *=1 92 = 5 3 (exp{0(T2f)} 'ep ® T1) e f ( e f ® ef*)(T2 ® T2) i = l 92 = ]T (exp{0(%i) is a differentiable function of ip, T3 and T4 are full column-rank design matrices of known constants with dimensions p x ^3 and % x g4) respectively, dim(y>) = ^2, and ^2 < Qi- It is assumed that (2.8) is consistent. That is, 3 £ such that = Co- The vector of eigenvalues in (2.9) depends on T^(T4), the vector space generated by the columns of T4. If I 93 G T^(T4), then T4 in (2.9) can be reduced to a matrix 42 that has one fewer column than the original matrix in the following manner. Let P1 = Ppo(I93) = I 93^ T1I 93 and write T4£ as T& = (Z93 - PX)TA + P1TA = (Zg3 — P l)TA + Ig3QrS Ig3Z ^- Substituting in (2.9) yields X _ ^pZ3 exp{Q((Z93 - P i )T4^ )} l ^ e x p { 0 (Z 93- f i ) l k ( ) } because exp(g^1 I z93T4^ ) is a scalar and can be canceled out. If I 93 G I Z ( T 4i) , then I 93 = T46 for some b . Therefore, if I 93 G T Z ( T 4 ) , then rank[(Z93 - P i )T4] = rank (T4 - I 93^ 1T93T4) = rank (T4 - T4 6% 1I 93T4) = rank[T4 (Z94 - ^ 1I 93T4)] = rank (Z94 — 1I 93T4) because T4 has full column—rank = t r (Z94) - tr ( I93Q2T1I 93) because 6% 1I 93T4 is idempotent = rank(T4) - I. Factor (Z93 - P 1) T4 as (Z93- P 1)T4 = u pv ;, where Up and Vp each have full column-rank. Then (Z93 - P1) ^ = UpC, where 43 Accordingly, T4 should be replaced by Up, a matrix whose columns form a basis set for Tl [(Ig3 — I 93(^1Ig3) T4]. For convenience, the remainder of the thesis still uses symbols T4 and £ instead of Up and £*. For the example in (2.5), T4 = T 2 should be replaced by T4 = ( ) , or any other full column-rank matrix that has column space equal to % ^ ^ y 1 I . Consider the second constraint. It follows from (2.8) that C X = C0 <=> C2exp{©(T4£)} = 0, (2.10) ■ To where C2 is a g3 x r matrix with rank r whose columns are a basis set for Tl [T3'(C 4 — I p C70)] . To show the equivalence in equation (2.10), substitute A from (2.9) to obtain ClT3exp{0(T4£)} l^ e x p { 0 (H £ )} ^ 4=^ C ^ exp{0(Tt£)} = c o i ;# exp{0(Tt£)} 4=^ (C (-C o i;)^ e x p { 0 (3 l£ )} = O -<==> C2 exp{©(T4£)} = 0. Let V = (Vi V2) be a Q4 x g4 nonsingular matrix of constants, where V4 has dimension $4 x r, V2 has dimension g4 x z/2, p2 = Qi - r, and V^ 7V2 = 0. Then, T4£ can be written as T4£ = TaV V - 1Z = T4 [V1(F17F1) - 1V1^ + V2(V27V2) - 1V27^ ] 44 — T^V1T + V2 = (VzV )- 1V ^ j and r is an implicit function of y . Substituting T4^ from (2.11) in the constraint (2.10) yields O2 exp{©(T4(VT + V2(p))} = 0. Taking the derivative of the above constraint with respect to r ' yields ^{C% exp{0 (2X V T + V T ^ ( I , ® V 2) + w H^T3W 1V1 } • Applying standard procedures for differentiation yields the following results: d w d ip ' dH x d ip ' dW i d ip ' d 2 r d ip ' <8> d ip ' -W 2VpTz Wi F2, K W2 (F2 ® Iy2), and (2.16) (2.17) (2.18) (2.19) (2.20) To verify equations (2.19) and (2.20), take the second derivative of (2.15) to obtain d 2T d ip ' <8> d ip ' dW-i - ( C ^ W iV l ) - :C ^ ( f ^ gv%) d ip ' (2.21) 48 because V{V\ = I r. To compute ^ r , express Wi in term of summation as follows: 93 Wi = e f exp{e?''24f i = l Applying the product rule yields r)W 93 - g ~ = X M e fT iW e f T t ® ® I n ) ^ i = l = WsfVS <8 JT^). (2.22) Substituting (2.22) in (2.21) yields d 2r dip' ® dip1 8 f ^ ) ( ^ 8VS) = - D ^ U '^ W z iV z 8 V2). Substituting the results (2.17)-(2.20) in (2.16) yields = -w (l) ( i)8 i;2 1 ,W iV s) J T ^ ) Wi t t ) + PowHx TgWsfVS 8 V2) - P o w H x T3 W 1V1 D f 1 C'2W 2{V2 8 V2) = -2w ( p f ] 8 IpTs Wi VS) N v2 + PowHx T3 H i W2(Vr2 8 Vr2). To compute the third derivative of \ with respect to (D H f (g, + PoW Hx H f Ws (^2 ® Vz ® Vf) - 2 w H^ T3 H fW s ^ ® ® (2.29) where N*2 = + {IV2 ® N v2). Combining (2.27) and (2.29) yields D (3)A -2wy,o#V%,H(W2 ® ® V^ ] ~ 2 w { ( D ( i ) ® IpT3WiT^j + I^T3H fW 2(y2 (8) V2) ® D ^ ] } N I +wipoHxTsH^Wsiy?. ®V2® V2). Corollary I. I f C2 has rank = 0, that is there are no additional constraints, then r = 0, V2 = I gi, V2 = cu, y? = £ and the derivatives simplify to n P = WpqH xTsW 1, D m D A (3) -2w ® I^T3W i) N v2 + WpqH xTsW 2, 'x -2w { ® 1JT, W i) + [ l j ^ w 2 ® I N, +wpqH xTsW s- Since Pq defined in (2.7) is a parameter, the first three derivatives of A in (2.9) with respect to pQ are useful for estimating parameters and for constructing the tests. The results are shown in Theorem 4. 52 Theorem 4. TAe first derivative of X in (2.9) with respect to cpo can be written as (i) = dA_ = T3exp{0(T4g)} = A_ l ^ e x p { 0 M } Po and the higher order derivatives are zero. Parameterization of Eigenvectors Boik (1998) described a local parameterization of orthogonal and semi-orthogonal matrices. Boik (2002a) applied this parameterization to the eigenvectors of the co- variance matrix. Let P b e a p x p matrix whose columns consist of the eigenvectors of S and let Po be an orthogonal matrix in a small neighborhood of P. That is P ry P0. Boik’s parameterization is based on the trivial equality P = Jp P. Write the identity 1 ' I matrix as an orthogonal matrix, PoPo- Let G(yu*) = P70P, where G(ix*) G 0(p) and pb* is a vector of parameters. Then, Gijj,*) pz Ip and the parameterization of eigenvectors can be written as P = P0GOu*). (2.30) It is assumed, for now, that P0 is a known matrix. This assumption is not made in Chapter 3 when estimation is discussed. By orthogonality, G is subject to p(p + l ) /2 constraints, GG7 = I p. If each eigenvalue has multiplicity I, then G is a function of p(p — l) /2 parameters denoted by jit*. The p(p — I)/2 elements of /u* correspond to the p(p — I) /2 elements in the upper triangle of G Let r f be an implicit function of jit*, then 77* has p(p + l ) /2 elements. Let m = (mi m2 • • • mU2)' be the vector 53 of eigenvalue multiplicities and denote the distinct eigenvalues by ( 2, the columns of Gi can be rotated to annihilate the mi(mi — l ) /2 elements above the main diagonal of Gi2. The dimension of p* is reduced to (p2 — ra 'm )/2 . For example, if p = 6 and m = (I 3 2)', then G can be written as follows: ( V i /4 Pt Pt //g \ Vl Vt 0 0 Pt /4 Vt Vt 7/12 0 Pt /4o Vl Vt 1^3 %16 Pt P l l Vt # 0 Vh V l 7 ?7l9 0 \ 776 vh vh ^8 7721 / Gi G2 6X1 6X3 G3 ). 6X 2 . If 11/it* 1 is small, then G Ip and given /T, GG7 = Ip can be solved for rf . Solving 54 for rj* is discussed in the last section of this chapter. In general, G can be written as vec G = Aifx* + A2Tj*, (2.31) where A i and A2 are known indicator matrices with dimensions p2 x ((p2 — m 'm ) / 2 ) and p2 x (p(p+ l)/2), respectively. Because Az1A1 = I{p2-m'm)/2, A12A 2 = Ip(p+i)/2, and A 1A2 = 0, vec G satisfies A 1 vec G — jj,* and Az2 vec G = rj*. Suppose that interest is in testing the hypothesis in (1.21), H0 : A G n {VM ), where A is a known p x g semi-orthogonal matrix with rank q and M is a p x m semi-orthogonal matrix such that H (M ) The restriction A G T^-(TM) further reduces the number of entries in n*. Theorem 5 describes this reduction. First, however, note that the parameterization in (2.30) is valid for any matrix F0 in a neighborhood of F. It is most convenient to equate F0 to F. In this way, the true value of n* becomes zero. That is, F = FG (aO ) Mlft=O because G(O) = I v. In the remainder of this thesis, F0 will be taken to be F. Theorem 5. Denote C = (M cl ® AT ), where M cM cl = I p - M M 1 and F = r c M U =O - Then> A G Te(FM) GvecG = 0. . 55 Proof. It will be first shown that A G IZ(TM) 4=> A 1T M c = 0. The proof consists of the following two steps-: Step I: proof that A G TZ(TM) = > A 'T M C = 0. Consider A = (T M M 1T 1 + FM cM cT ') A . A e t z (TM ) => f m m T 'a = a = > T M cM clT1 A — 0 = > A TM c = O. Step 2: proof that A 1T M c = 0 ==> A G TZ(TM). A TM c = O =^> A = FM M T 'A = > A G %(FM). Applying the property of the vec, vecpCT Z) = (Z' ® %) vec T , (2.32) where X, Y , and Z are any matrices that are conformable for multiplication, yields A eT Z (TM ) <=> A T M c = O 4=> vec(ATM c) = 0 <*=*► vec(ATGM c) = 0 4=^ C vec G = 0. □ 56 Accordingly, G(fj,*) is further constrained. It was shown in Theorem 5 that A 6 TZ(FM) <==> CvecG = 0. These additional constraints reduce the number of parameters by (p — m)q and increase the number of implicit parameters by (p — m)q. Let V* be a (p2 — m 'm )/2 x (p2 — m 'm )/2 nonsingular matrix and V* = (V3 V4), where Vz has dimension (p2 — m 'm )/2 x Iv1, V1 = (p2 — m 'm ) /2 — (p — m)q, V4 has dimension (p2 - m 'm ) / 2 x (p - m)q, and VlVz = 0. Then, vec G can be rewritten as vec G = A 1 ju* + A 2rj* = A 1F3(F37F3) - 1F3V* + A 1F4(F47F4) - 1F4V* + A2T7* = A1F m T A 1F tT7l + A 2Iq2I (2.33) where T7 l = (V^F1) - 1F tV*, % = ?7*, and /u = (F37F3) - 1F3ZLt*. Equivalently, vec G = A 1VzfJ, + [A1F A2Jt7, where T7 = ^ ^ 1 ^ (2.34) and T7 is an implicit function of fi. Taking the derivative of the constraints with respect to T77 yields / n / \ P o T i / A t z n r v a I (2.35) where D p is the duplication matrix (Magnus and Neudecker, 1999), Dp satisfies vec H = DpVechDr, D is a symmetric matrix. By the implicit function theorem (Taylor and Mann, 1983), if the matrix of derivatives in (2.35) is nonsingular, then T7 is an implicit function of fj implying that 0 - exists. To verify that (2.35) is _d_ ( D 1vVecGG1 \ ' SDpA1F SD^A2 ' drf' \ C vecG J At=O C A 1F4 CA 2 57 nonsingular, the following results are useful: D 1pA 2 — Ipip+i)/2 and (2.36) A 2DpAi — I(p,p) A l. (2.37) The derivative matrix in (2.35) is nonsingular if and only if its rank = ~^^~+(p—m)q. The rank of a matrix is unchanged by interchange of columns, therefore Because the matrix D 1pA 2 is nonsingular with rank by (2.36), the derivative matrix in (2.35) is nonsingular if and only if rank (CAiV^ — C A 2DpA 1Vi) = (p — m)q (Ouellette, 1981). That is, is nonsingular. The matrix V* should be chosen to ensure that matrix (2.38) is nonsingular. A suitable V* and expressions for the first three derivatives of vec G with respect to fi' for a special choice of V* are given in Theorem 6 and Theorem 7. Theorem 6. Define W4 as W4 = C (Ip2 — I v^)) A 1. Express W4 in term of its singular value decomposition as W4 = U(D 0) ( ^ = UDVl> . (2-39) where C is defined in Theorem 5, D is an (p — m)q x (p — m)q diagonal matrix of singular values, and V* = (V3 V4) 6 0((p 2 — m 'm )/2 ). This choice ofV* ensures that the derivative in (2.35) is nonsingular. C{Ip2 I ip ^pf) A 1Vi (2.38) 58 Proof.' It is necessary to show that W4 — C( Ip2 — I Ai has full row-rank. The quantity W4. has full row-rank if and only if t'C (Ip2 — I(P^f)A1 — O 2 — 0. (2.40) Itfollowsfrom (1.21) that M c ° ) R J ’ where R is any (p — m) x (p — m) orthogonal matrix. Therefore, C = [M c' ® AT) (0(p-mxm) A F (2.41) Let F = (Fi F2) which have dimensions p x m andp x (p — m), respectively. It also follows from (1 .2 1 ) that A G Ti(FM) = > A E Ti(Fi) A = F1T for some T A! = T'T'i AT = [T' 0), because Fi and F2 are orthogonal. Substituting in (2.f.l) yields C = [M cl AT) = [0 Ji'] 8 [T' . 0] . (2.42) Expressing A i as ® eDeY > where j > i and w = p2 the left- hand-side of (2 .4 0 ) can be written as p p t' *=1 3=i f[0 0 ] j ( ^ - f ^ ) ) ( ^ 8 e ? ) e y = 0. 59 The following properties of the vec are applied: [X ' ® Y)a,i = vec(Y" dvec(Oi)X ) and dvec(vec(e?e%'),P,p) = (2.43) (2.44) where X and Y are any matrices. Accordingly, ] r W i ' v e c ( [ r 0 ](eH ') ( “ 1=1 j = i ^ ^ \ - t ' v e c ( [ r 0 ](e '< ) ( I R I I 6™' eT Therefore, £ £ t ' v e c ( [ T ' 0 ] ( e ? e f ) ( M ) = ^ vec ([T' OKefe?') 1=1 j = i ^ \ / / i=1 j - i \ Define T* so that t = vec T*', then tr ( r * [ r 0](e?e?') ( ^ ) ) = tr ( t *[T' 0](e?e?') ( ^ ef' ( ” ) T ’ [ r 0]e*\nr' — JP'\e-r I R ) T*[T' 0 ]^ , which implies that Finally, T* [T' 0] is symmetric. R T*[T' 0] 0 0 R T * T 0 and RT*T ' t = 0 , 60 because R is orthogonal and T has full column-rank. Hence, Wi has full row-rank and for fixed V* in (2.39), W 4V4 = UD is nonsingular. □ By Theorem 6,77 is an implicit function of jj, implying that J^r exists. Expressions for the first derivative of 77 with respect to / / can be obtained as follows: dffh dn' d v 2 djj,' 0, - D 1pA 1V3. (2.45) (2.46) To verify the above equations, take the derivative of the vec of the first constraint, GG' = Ip, and use property (2.32) to obtain d vec GG 1 <9/V d vec GG 1 — ( I p 2 + I ( p , p ) ) ( I p ® G ' ) SvecG dfi' — 0 and dfi' 2 #(j,=0 SvecG S/Li' 0, where N p — ^( I p 2 + I(p,p)) - Equivalently, m S vec G ^ S/i' PL=O 0, (2.47) (2.48) because N p = Dp(DpDp) 1D lp. Simplification yields D 1pA 1Vt + D 1pA 1V1^ + -Dp A2Qrj2dfi1 0 61 dfl2 dn' [D1pA 1Vs + D 1pA 1V^ J because of the result in (2.36). Taking the derivative of the second constraint, C vec G = O, yields &ni ,<9 vec G dii' dV2C A 1V3+ C A l Vi ^ + C A 2^ 0 0. Replacing (2.49) in (2.50) and using the result in (2.37) gives C (Ip2 — I(PjP)) A 1V 1 W tV t dm dn' dm dii' dm dfi' —C (Ip2 — !(p^)) A 1Vs -W tV 3 0, (2.49) (2.50) because W t = UDV l and VlV 3 — 0. Then, it follows from (2.49) that dm dfi' -D fpA 1V3. Expressions for the first, second, and third derivatives of vec G in (2.33) with re­ spect to /x' are given in Theorem 7. These results are useful for estimating parameters and for constructing likelihood ratio tests including Bartlett corrections as explained in Chapters 3-4. 62 Theorem 7 (A dapted from Boik 2002a). The first three derivatives of vec G in (2.33) with respect to n ', evaluated at pi = Q, can be written as follows: = <9 vec G o f dpi' <92 vec G = (JP2 - J(p,p)) AiVr3, At=O At=O 93 vec G • = (1P2 - P )A 2D'p{Ip vec Jp ® Jp)' ® - ( P - Jp2)A2Dp(Jp ® vec Jp ® Jp)' At=Od/LA' ® d p i ' ® d p i1 D f ® /(„,„) Dg>) - (r>g> ® D f ) {/„, + (/„, ® K 1(CK 1) - 1C 1 K 1 = (Ap= - I lplliIA1Vll X where P ■ C K i = W4V4 = UD , and C is defined in Theorem 5. (2.51) Proof. Denote the ith derivative of rjj by r j^ . It follows from (2.33) that D $ = A 1Vz + A 1Vm P + A 2TiP = A 1V3 - A 2D 1pA 1V3 by (2.45) and (2.46) — (Ip2 — I(p,P^A 1V3 by (2.37). The second derivative of vec G can be written as Bg) = A i t q ^ + A2, ,^ (2.52) To compute r]P and r iP , first examine the second derivative of vec G'G: 92 vec G G d p i ' ® d p i ' d d p t ' 2Np 2JVp(]p ® G') 9 vec G d p i ' = O d(Ip ® G') d p i' J^ ® 9 vec G d p i ' + (Ip ® G') 92 vec G d p , ' ® d p , ' = 0. 63 Using the properties, {X Y ) Z = J(rjp) (Jr (g) (vec I s)' O X ) (vecF ' ® I ^ Z ) , where X is p x q and Y is r x s, and N p = N p I(p,p), yields d2 vec G'G At=Odfx' ® dpi' N t { (It ® (vec J ,) ' ® 4 ) (n g ) ® e g ) ) + e g ) } = 0 = e ; { - (Jp ® (vec J ,) ' ® J,) ( e g ) ® e g ) ) + e g )} = o = e ; { - j f + e g ) } = o , where K ■ Ht 0 vec It ® Jp)' ( e g ) 0 e g ) ) and Jfppi e g ) = — e g ) . Taking the second derivative of the second constraint, C vec G = O, yieMs ^ S2YecG dp,' ® dpi,' /t=0 ) (2) _C D q = 0. Substituting from (2.52) in (2.53) yields then { - K + A lV in f + A2Tt f ] = o, I j f = D 1pK - D 'A 1V tn f , by using result (2.37). Substituting from (2.52) in (2.54) yields C [A 1V ^ + A 2T1^ ] = 0 . Substituting rffi from (2.55) yields C (Ip2 - I m )A 1V ^ + C A 2D 1pK = O W4F4T?? + C A 2D 1pK = 0. (2.53) (2.54) (2.55) 64 Replacing W4 = UDV l yields = - [ U D y 1CA 2D ' K . Then, it follows from (2.55) that T1W = D 'K + ^ A 1F4(T O )-1CA2B lii:. Consequently, 1(2) A2B lK - P A 2D 1K [Ip2 - P ) A 2D y ip ® vec ip (g) ip)' (b ^ (gi B ^ j Similarly, D q can be obtained. It follows from (2. S3) that B§) = A 1^ ) + A 2^ . (2.56) To compute Tf1 ’ and ry2 , the third derivative o/vec G'G with respect to fj! is obtained as follows: d3 vec G'G dfi' OfJ,' ® dpi' (i=0 — 1 (^p ® (vec ip) ip) (ip2 I(p,p))" ( s g ’ e D g i ) dy.' (i=0 a (4 ® c ') Bg) d/it' A^=O 0. (2.57) For convenience, the two parts on the right-hand-side of (2.57) are labeled as Ga and Gb- Using the Kronecker product rule, Ga can be simplified to Ga = - N p (ip (vecIp)' ® Ip) (Ip2 ® i(p,p)) + ^ 2 ^ 2 ) (b ^ ® B y ) (Jy1 ® I(UiyHi)) (D g 1O-Dg') 65 Express (vec I p ) 1 as Y^ h=I (ef ® eD an^ repeatedly use the permutation matrix:. {A® B ) = I(pjr)(B A) I (Ctq), where A is r x c and B is p x q, to simplify Ga as follows: Ga = - N p (Ip ® (vec Ipy ® Ip) ® [lu3 + (I 1^ ® . Similarly, G^ can be simplified to = Np { (Zp 8 ( v e c ® ® + a g ) I . Combining G a and Gb yields = Np i (fp ® (vec ip)' <8 ip) [ - ( i)g ) 8 D g)) [f,, + (I,,, ® f^,^))] + (Dg) <8 i(p,p) Dg)) ] + Dg) j = D 1p [K* + D i^ = 0, (2.58) where K * = ( ip®veclp® lp)'( (d ^1® J(P1P) - ( D f ® D f ) [Jp, + (I,, « /(„ „ „ ))]) . It follows from (2.54) that C D g ) = 0. (2.59) . Replacing Dg) from (2.56) in (2.58) yields D'p [ k * + A 1W i 3) + A2T7? )} = 0. cr vec G1G dpi1 <8 dpi' <8 dpi' 66 Therefore, Substituting from (2.56) in (2.59) yields Substituting rffi from (2.60) yields = 0. C(Jp, - J(Pf))Ai-%%(3) - CAa^JRT* W4V4T7S3) - C A 2D 1pK* 0 0. Replacing W4 = UDV f yields T7J3) (U D )-1C A 2D 1pK*. Itfollowsfrom (2.60) that i f = - D 1pK* - D 1pA 1Vi (U D )-1C A 2D 1pK* Finally, replacing r j f and r j f into (2.56) yields (P - IP2)A 2D 1pK* (P - Ip2)A 2Dp (Ip ® vec I p ® Ip)' I (d Q ® I(p,p) D f^ j - (I)C?) ® D §)) [I,? 4- (J^ ® I(,ifi))] (2.60) □ 67 Corollary 2. The first three derivatives of vecG in (2.33) with respect to fj,', evaluated at fj, = 0, under constraint A e 'R (TM ), where R,(M) — % ^ ^ ^ can be written as follows: ,(i) <9 vec G djj,' = (IP2 - /(p,p)) A 1Vr3,At=O (2) _ <92 vec G G dfj,' OpL1 At=O = A2-Dp (Ip vec Ip Ip)' ^ d3 vec G d/Lt' ® 9//' g>) { + (J „ ® J ^ ) } . (-Dg1 ® /(„,„) Dg21) Proof. This is a special case of Theorem 7 when Tl(M ) = % ^ Q j ' ^ s m^^ ar to the proof of Theorem 7 when C A 2 = 0 proven in the next theorem. Then, the first three derivatives can be reduced forms because P A 2 is equal to zero. The second and third derivatives of rj with respect to fj,' can be written as ri? = 0, T7P = D 1pK , I7P = 0, and % ' = - D 1pIC □ Under constraint A G Tl(TM ), where Tl(M ) = 77 ^ , the implicit pa­ rameter rj is restricted that Tq1 = 0. This condition reduces form of the first three derivatives of vec G in (2.33) with respect to n ' as shown in Corollary 2. The proof is shown in the following Theorem. 68 Theorem 8. For fixed V* from Theorem 6 and under constraint A G Tl(TM) yield­ ing C vec G = 0, where Tl(M ) = % ^ ^ ^ , then Vi = o, where % has dimension (p — m)q x I. Proof. Note that vec G = AiVgjLt + A 1V4lIg1 + A 21H2, where jit = Vfn*, % = Vfn*, and H2 = H*- It follows from (2.39) that C (IpZ -I(P ^))A 1 — U D V f = ^ C A 1 — U D V lpC I(P tf)A 1. (2.61) Since, C vecG = C A 1V* + C A 2H* = 0, substituting (2.61) in yields UDVln* + C I ( P t f ) A ltJ-* + C A 2H* = 0. Therefore, UD h1 C ^ A ^ + A g^) CAg M A i ^ + f7*). (2.62) 69 It will now be shown that C A 2 — (M cl AT ) A2 — 0 . It follows from (2 .4-2) that (M c' O AT ) A2 = ^[0 R!] O [T' 0]^ A2, where R is any (p—m) x (p—m) orthogonal matrix. Expressing A2 as YJj=i YJi=j(eVj® e ^ )e f, where i> j , w = pip + l)/2 , and I = U- 1Wp-J) j yieids (M cl ® AT ) A2 = ( t0 izI ® [T' 0 ]\ (Bpj ^ e pi )B f. j = l i = j ^ Accordingly, using (2.43) and (2.44) yields (JW0 O A T )A 2 = ^ f ^ v e c ( [T ' 0](e?e?') ( ° ) ) e » j=l i = j ^ V / / where i > j . Choosing j < i < m yields i \ R = 0. Choosing i > j > m yields [T' 0]ep = 0. Consequently, (M cl AT) A2 = C A 2 = 0. By replacing in (2.62), the result is UDrj1 = 0 Vi = 0. □ 70 Solving for Implicit Parameters Denote by £(0) the loglikelihood function of 0 given an observed Y . The MLE of 6 , 0, is the vector which maximizes £(0). To compute this maximization, the Newton-Raphson, algorithm is employed. Let 0O be an initial guess and Oi be the estimate of 0 after the ith iteration. Expand £(0) in a Taylor series around 0 = Bi, so that, # = ^ ) + ( e - e , y + 2 ( e - ^ y '^ ( 0 ) 80 O=Oi- ' 0^(9) O=Oi- (2.63) where Qei — cW(9) O=Oi and Hg. = 80 ® 89' O=Oi Set the derivative of £(0) to zero and solve for 0. The result is S = Si - H ^ g gt. ■ (2.64) Solving for r In reduction in the number of components, T4£ can be written as (2.11), where r is an implicit function of (p. For given (p, to solve for r in (2.11) subject to constraint (2.10), C 2 exp{©(T4^)} = 0, the equation (2.64) is employed as follows: T = Ti - H ^ g f i , (2 .65) 71 Qti = Cgexp(G)(J^M)) and Hf, = C^diag(exp{0(l^M )})j;^M where Solving for rj In reduction in the number of variables, vec G can be written as (2.34), where r/ is an implicit function of /Li. For given /Li, to solve for rj in (2.34) subject to constraints, GG' = Ip and C vec G = O, the equation (2.64) is employed as follows: where rj = % %2 ) 9fii V = V i - D) vec(GM GM' - Ip) Cvec CM , and „ _ / 2D;(G«)® JpXA1V4 A2] x “ I CIA1F4 A2] b In the special case described in Theorem 8, vec G can be reduced to vec G = Ai /it + A2TZ2- (2.66) To solve for Ty2, and can be simplified to gt- = D 1pVec(GMGW -Ip) and = 2D;(GMg\ where -ZVp — (-Tp2 - (^p,p))/2 .^rid -D^ ? — (-Zp2 (^Ptp)) • Boik (1998) showed that the maximum likelihood estimate of S , E, can be ob­ tained by solving s^(e) 80 = 0. fi= 0 (3.5) There is no simple, closed form solution for E. Therefore, the solution, E, to this equation has to be obtained numerically. Details are explained in the next sections. Fisher Scoring Algorithm The Fisher Scoring algorithm is an iterative procedure that can solve equation (3.5). The method is based on the derivative of the loglikelihood function and the expected value of the Hessian matrix. Denote by I(O) the loglikelihood function of O given an observed Y . The MLE of 0, 0, is the vector that maximizes 1(0). Let Oq be an initial guess and Oi be the estimate of O after the ith iteration and let Igi be Fisher’s information at the value O = Oi. That is, Ia = - E " a ^ (e ) O=Oi- Expanding £(0) in a Taylor series around O = Oi yields (3.6) £(0) sd £(0i) + (0 — Oi)1 gg. — ~(0 — 0i)'Ig.(0 — 0i). (3 .7) 76 Set the derivative of t{d), expressed as (3.7), to zero and solve for 0. The result is 9 = & + (3.8) where g§. is defined in (2.63). The right-hand-side in (3.8) becomes the new guess and the procedure is repeated until convergence. The above description is a conventional application of the Fisher scoring algo­ rithm. The present application is not conventional. Details are given below. Solving the Likelihood Equation Because there is no closed form solution for the MLE, the solution must be ob­ tained numerically. To compute the maximum likelihood estimates numerically under parameterization of A in (2.4), Boik (2002a)’s algorithm is adopted as follows: Step I Compute initial guesses for f and A (discussed in the next section). Denote these guesses by F0 and A0, respectively. S tep 2 Denote the maximum likelihood estimate of S after the ith iteration by S i = FiA if; Step 3 Set /Lti = 0 then Si = (O' <£>')'. Use one iteration of the Fisher scoring algorithm to update Si, that is, Si+i — Si + Igi1Qgi, 77 where g§. is obtained from (3.2) evaluated at 0 = Oi and I§. is defined in (3.6). This is not the conventional Fisher scoring algorithm because fa is not updated from iteration to iteration. Instead, fa is set to zero and updated from zero in each iteration. Step 4 By using the updated 0i+i from step 3, compute G( fa+1) from model (2.31) or (2.33), and Aw = A((pi+1). Step 5 Update f i+1 = t iG ( fa +1). Step 6 Iterate steps 2-5 until convergence. To numerically compute the MLE under parameterization of A in (2.9), step 4 of the above algorithm is replaced by Step 4 By using the updated 0i+i from step 3, solve for = V i f i+i + i+1, where Vi and V2 are given in Theorem 2. For given <£j+1, f i+i is obtained from (2.65). Compute Am = A(^m ) and G(fa+1) from model (2.31). Finding the Initial Guesses In using an iterative procedure, the original initial guess is very important. The algorithm may not converge to the MLE if the original initial guess is far from the MLE. The following manner is used for making initial estimates of f and A. 78 To find an initial guess for f under the hypothesis (1.21), H0: A G TZ(TM), the singular value decomposition of the sample covariance matrix is employed as follows: S r*A*f*/. f ; f ; 1 Assume that the diagonal entries of A are sorted from small to large and f is a matrix of the corresponding eigenvectors. The quantity f can be partitioned as , where and fg have dimensions p x m and p x (p - m), respectively. Denote the original initial guesses for f and A by F0 and A0, respectively. Let A* be a p x m semi-orthogonal matrix, where A* — (A Ac) and A c is orthogonal to A. Let Q be a block diagonalp x p orthogonal matrix which, when applied to [A* A*c] , results in a matrix which is as similar as possible to f . To find F0, choose Q such that F0 = [A* A*c] Q t t f * , (3.9) where A*c is an orthogonal complement to A*. Writing Q as a block diagonal matrix yields [A * A*1 ( E?1 Q2 ) “ (f i f O ) where Qi G 0(m ) and Q2 G 0(p — m). Accordingly, A*Qi % F1 and (3.10) A ^Q 2 % fg. (3.11) To solve for the orthogonal matrices Qi and Q 2 , the orthogonal rotation to congru­ ence by Cliff (1966) is repeatedly applied. This method minimizes \\A*Qi - F 1H2 79 with respect to Qi for (3.10) and \\A*CQ 2 — I^H2 with respect to Q 2 for (3.11) and maximizes tr( T e1Q 1A*1) and t i (T*2Q'2A*C), respectively. The results are shown in the algorithm below. It follows from (3.10) that [A A c] = T lQ 1 = IB1 B 2], (3.12) where B i and B 2 are the first q and the last m — q columns of T e1Q1, respectively. The quantity A c must be orthogonal to A. Accordingly, let A c = (Ip - A A 1)F for some F , where F is orthogonal by columns. Thus, (Ip - A A 1)F = B 2. (3.13) The following steps give an algorithm for computing T0. Step I Begin with any orthogonal matrix Q1. Step 2 By using Q 1 from step I, compute B 2 from (3.12). Solve for F (Cliff, 1966) in (3.13) as follows: write the singular value decomposition of B'2 (IP -AA ') as B 12 (Ip - A A t) = U iD iJ[ = J iU 1. Step 3 Compute Ac and A* by using F from step 2. Solve for Q 1 (Cliff, 1966) in (3.10) as follows: write the singular value decomposition of f^A* as T 1 A* = U2D 2J t2 = > Qi = JiU2. 80 Step 4 Repeat steps 2-3 until Qi converges. Step 5 Solve for Q2 (Cliff, 1966) in (3.11) as follows: write the singular value de­ composition of f 2 A*c as t*A*° = U3D 3J^ = > Q2 = JzU3. Step 6 Obtain f 0 from (3.9) by using Qi and Q2 from steps 4-5, respectively. To find A0 or <£0 that satisfies model (2.4), let A = diag(A ). Differentiating ||A - T i exp{©T2y 0}||2 or minimizing ^A* - Ti exp{©T2v?0} j ^A* - Ti exp{©T2y>0} j with respect to exp{©T2<^ 0} yields 9 ^A — Ti exp{©T2y?0}^ ^A — T l exp{©T2y?0}^ 9exp{©T2y 0} Equating the derivative to zero yields T1rTiexplQT2^ = T1rA exp{QT2 = (T1rT1)-1 T1rA*. If (T1rTi)-1 T1rA* contains positive numbers only, then taking the log of both sides yields dp ' /J,=o i (.Dg1' (AT ' 0 T l) (S "1 0 S 1) (T 0 T )LD '1! \ D f (Ip m - 1) L D ^ . Expressing L as in (2.3) gives h m = I B g 1X (e? 0 ej) eJ'A-1 D f J=I = j T ( I ? - (e? 0 e JX A r1 D f J=I = 0, because i(p,p) (e^ ® e^) = (e^ ® e^). Similarly, It can be shown that - 0,tpii — ' d2£(0) ' dtp dp' ± D P 'L ' ( I r ® A -l)D , 2 = 0. ( i ) G Hence, Ig is a block diagonal matrix. □ 84 Due to the block diagonal structure of Ig, p, is asymptotically independent of cp which implies that f also is asymptotically independent of A. Accordingly, it follows from (3.15) that the joint asymptotic distribution of p, and cp is as follows: VS(A-M ) A w ( o j ^ ) , (3.19) VS(v - V) A JV (o, ^ ) , (3.20) and p, R ip. Asymptotic distributions of functions of 0 can be obtained by the delta method. Let 0 be a random p-vector with asymptotic distribution y/n(0 — 0) —^4 N (0 , Ig 1). Suppose that g(0) is a differentiable function. It follows that the asymptotic distri­ bution of g(0 ) is \/n[g{0 ) - g{0 )] N O,D(0)Ig 1D(O)' ? where D ( 0 ) = dg(d) 96' 0=0 The proof can be found in Sen and Singer (1993). To find the asymptotic distributions of eigenvector estimators, consider that t = TG vec f = (Ip r) vec G. 85 The Taylor expansion of vec f around /1 = 0 can be expressed as vec f = vec F + (lp ®F) (/I — /z) + Op (n-1) and Vn vec ( f - F) = y/ri(Ip ®F) (/I - //) + Op (n-1/2) . Using the delta method and (3.19) yields V5vec(f - r ) A j v f o , ( ir o r ) o f r : ‘ D ^ ' (ir o r ') (3.21) If Vi is identifiable, it follows from (3.21) that V5(7i - l i ) N 0, (Ir o e jr ) D ^ '( I r OVei) (3.22) Similarly, the relationship between A and A is X = L' vec A. The Taylor expansion of A can be written as A — A T ^(y — ( p ) + Op(n )^ and Vn(A -A ) = Vn (

oo, - 0 ) N (o, I 0 1M 0 , where f l = Iimn^ 00 (S -1 ® S -1) ^fn (S _1 ® S -1) and ^ rn = nvar(vec S). Boik (1998) provided an unbiased estimator of Asymptotic distributions o f f , J i, and A can be obtained by substituting I 0 1H I0 1 for I 0 1 in equation (3.21)-(3.23). 87 CHAPTER 4 TESTING HYPOTHESIS AND BARTLETT CORRECTION In principal component analysis, it is often of interest to test hypotheses concern­ ing eigenvalues and eigenvectors of the covariance matrix. The matrix S is rarely known and tests of hypotheses must be based on an appropriate estimate of S (dis­ cussed in the previous chapter). This chapter deals with procedures for testing di­ mension reduction. The procedures are based on the standard likelihood ratio test. The hypotheses of interest concern the dimension reduction in the number of com­ ponents and in the number of original variables. For each hypothesis, the likelihood ratio test statistic is derived and improved by multiplication by a Bartlett correction factor which provides a better approximation to a X 2 distribution. Hypotheses Let Ai < A2 < • • • < Ap be the eigenvalues of S and T = ( J 1 J 2 - ■■ 7P) be a matrix of the corresponding eigenvectors. In testing dimension reduction in the number of components, the eigenvalues give information about the dimension of the data. If the cumulative proportion of total variance accounted for by the p — k smallest eigenvalues is small, then little information is lost by discarding the corresponding components. The general null 88 hypothesis of interest is defined in (1.8) as where Ci is a matrix of known constants and C0 is a vector of known constants. This hypothesis is tested against the alternative Ha which says that Ho is not true. Specifically, to focus on dimension reduction in the number of components, Ci is defined as (l'v_k O7fe)' and C0 is a small number recommended by an expert in the area of study. The test of this hypothesis is useful for constructing a confidence interval for proportion of variance. In testing dimension reduction in the number of original of variables, the general null hypothesis of interest is defined in (1.21) as where A is a fixed p x g semi-orthogonal matrix with rank q and F is a matrix of eigenvectors. The alternative hypothesis is the general Ha which says Ho is not true. Suppose A is equated to A = {Iq O')'. Then, the hypothesis to be tested is that the first k components have zero loadings in the last q positions. This test is the redundancy test. If the hypothesis is not rejected, then the last q original variables have zero coefficients in each of the first k components. Hence, these variables are redundant and can be discarded from the study. In deriving suitable tests for testing these hypotheses, the well-known likelihood ratio tests are used. 89 Likelihood Ratio Test This section introduces the likelihood ratio test (LRT) related to the maximum likelihood estimates discussed in Chapter 3. The general strategy of the LRT is to maximize the likelihood function under the null hypothesis, and also to maximize the likelihood function under the alternative hypothesis. For the loglikelihood function in (3.1) and the maximum likelihood estimates from Chapter 3, the likelihood ratio test statistic is L 0 ( 0 O) Zfa(Oa)' (4.1) where L0(Oo) and La(Oa) are the likelihood functions maximized under H0 and Htt, and 0O and Oa are the maximum likelihood estimators under H0 and H0 respectively. Equivalently, it follows from (4.1) that H0 is rejected for large values of Q = - 2 In A* = 2 { 4 (0 ,) - Z0(O0) ) = D(0,; 0) - D (0O; 0), (4.2) where D(O) 0) is defined as 2 |l (0 ) - 1(0) j and i(-) = In L(-). Denote the dimensions of 0O and Oa by z/0 and z/0, respectively. The test statistic Q asymptotically follows Q (4.3) where The above general result may be used to obtained a test statistic in variety of situations including hypotheses of interest in (1.8) and (1.21). Boik (2002a) discussed a way to compute D(O) 0) in the following manner. 90 Denote the ith derivative of the loglikelihood function, evaluated at // = 0, by £i(9). That is, 4 0 = T - 4 0 = 90 '@ 90 ' , (0] 8V(6) ^ ^ 0 0 '(E) 00'(3,00' and so forth. Let E -J = n-1 E(li(0)) and ^ , where E(Zi) = 0. Therefore, £i(0) = ■\fnZ{ + nKi. (4.4) Let 0 be a consistent root of the likelihood equation such that l\(0) = 0. Expanding l\(Q) in a Taylor series around 6 = 8, substituting Ii(O) from (4.4), and dividing both sides by ^/n yields 0 = Zi + (ji 2 9 + (j1 + n ^ (g (E) g) (n - l z 4 + U-1ET4) ( g® g® g ) + Op , (4.5) where g = ^ ( 6 — 6) and K i = 0. Let g — -\/u(0 — 6) — Sq n 2Si n 1^ 2 + Op ( n^ 2) . (4.6) 91 To solve for Si in terms of Z i, replace g in (4.5) by (4.6) to obtain polynomial functions of n which have zero coefficients. Collecting each of coefficients and solving for d, yields the following results: d0 = I g 1Z 1, <5l = Ig 1 I ^ 2 0^ + 2 (^0 ® 0^) J- ’ ^2 = Ig 1 "I ^ 2^1 + 2^3 (^o ® do) + Ks ^do <8> di) + — !^4 ^do ® do do) ^ , where K 2 = - Ig . To express D(6-, 0) in terms of Z i, i = l , 2 ,3, expand D(0] 0) in (4.2) in a Taylor series around 0 = 0 and substitute £i(0) as (4.4) to obtain D(0] 0) = 2g'Z 1 + g' ^Z 2-\- K 2J g + -g ' 1Z^ + n ^.Ks) (gf ® g) + - g ' ^n- I z 4 T n -1K 4) (g ® g ® g) + Op (n - l ) . Substituting g from (4.6) and collecting terms of Op(I), Op(n- 1) and Op(n-1) yields D(6; $) = Z ? I eZ l + n~i { Z ?K 3 ( Z l® Z { ) / 3 + Z t1Z 2ZD (4.7) +n - i {ZsZ; + Ka(Z* ® Z*)/2}'T^ {Z^Z* + Ks(Z* ® Z*)/2} +Z*'K4 (Z* ®Z*® ZH /12 + Z^Zs (Z ^® Z*) /3 + Op (n -# ) , where Z^ = I 0 1Z 1. Because it can be shown that Z 1A iV (O 5J e) , 92 the Op(I) term in (4.7), Z^'IgZ^, is asymptotically distributed as X l and E (Z^1IgZ f ) v. The Op and Ov (n-1) are useful for constructing Bartlett corrections. Ma­ trix expressions for Z i and K i for 1(0) in (3.1) are shown in the last section in this chapter. Bartlett Correction It follows from (4.3) that <3 ~ X L + °p ( " _ l ) • The Bartlett corrected test statistic (Boik, 2002a) is Qc = Q %/AE(QIH0) which has null distribution Qc ~ + Ov j , where E(QlHo) = E {Dfe,,; e,,)|H0} - E {d (»o; #o)|H0} (4.8) (4.9) The approximate expectations can be obtained by evaluation of the expectation of equation (4.7). To compute the expectation of D{6\ 0) in (4.7), the third and fourth order moments of vec(S — S) are needed. Third and Fourth Order Moments Boik (2002a) discussed an easy way to get these moments directly from the cumulant generating function in the following manner. Let S = ; where Ui iV(0,E), then S ~ Wv(n, S /n ). Note that Dp and H p satisfy Dp vech X = 93 vec-X", H p vec X = vech X , and DpH p = N p, where X is any p x p symmetric matrix and N p is defined in (2.47). The moment generating function of Vech(Sr) can be written as M vech(S) (t) = E [exp (t' vech (S))] = E [exp ( f ITp vec (S))] |E*|n/2 ™ • |S |n/2 ’ where vech S* 1 = vech S -1 — | [D1pDp) 11. Then, the cumulant generating function of vech (S) is = | l n | S * | - | l n | S | . To find the first four derivatives of the cumulant generating function, the following results are repeatedly used: 9 vec 2 * a t' a vec E* : — HL and - (£* O E*) H 'n F (4.10) (4.11) The first derivative of the cumulant generating function can be written as ( t ) ac(°)(t) aC(°)(t) nd ln |E * | 2 at as* ac(o)(t) = (vec H*)'H' 94 Therefore, C w (t) = HpVecE* or vechS*. (4.12) The second derivative of the cumulant generating function can be written as C {2\ t ) = __ avecSdt' ® dt = H n = - H Pcs* o (4.13) The third derivative of the cumulant generating function can be written as c (% ) .d3C ^ { t ) _ 2h d{S* ® dt' 0 dt' ® dt n p dt' L (I , ®(vec Jp)' ® Z*) <8 J f ; -^Jfp (S* 8 (vec 2*)' 8 E*) % 8 JTp)', (4.14) by (vec Jp)' (E* 8 Jp) = (vec E*)'. To compute the fourth derivative of the cumulant generating function, the following results are useful: E (A B C D) = {(vec B) ' 8 E } [(Ir 8 vec A )C 8 D ] , where B is q x r, A , C, D, and E are any matrices that are conformable for multi­ plication, and E (A 8 BCD ) = {(vec C)' 8 E } I(tP,qr) [A 8 (vec B ) 1 8 D ] , where A is t x s,. B is p x g, C is g x r, and D and E are any matrices that are conformable for multiplication. The fourth derivative of the cumulant generating 95 function can be written as ( t ) ^C(Q) (t) d t <8> d t ' ® d t ' ® d t - H p^ (ip ® vec Ip ®(vec S*)' ® S*) + ® (2* # (vec S*)' ® vec f , ® Ip) + (ipfp+D O Hp) ^ ® S* 0 (I(p,p2) ® Ip) I (Up ® H p )'. 8 f /a(vec2*)V Substituting (4.11) and simplifying yields OO c ^ ( t ) = — (Ifp®Ifp)iVp,{2*®(vec2*y®vec2*®E*}(IZp®apy (4.15) +%#„ ® Hp) (^ ® I(p,p) ® ip) (E* ® S* ® S* ® S*) (ilp ® Hp)'. The matrix S*, evaluated at * = 0, simplifies to S. Suppose to is a random vector with E(tu) = cr. It can be shown that C^(O ) = E(tu) and (O) = E {(iy — a)(w — cr)'} . Equivalently, C^(O ) = HpE[vec(S - E){vec(5 - S)}'] H 1p, (4.16) where w = vech S = H p vec S . Pre-multiplying (4.13) by Dp and post-multiplying by .Dp and using (4.16) yield E {vec(S - S){vec(5 - E)}'} — N p^E ® E) = var(vec S), (4.17) because N pvec(S — S) = vec(S — S) and -ZVp(£ ® S) = (S ig)S ) N p. The third cumulant of w can be written as 9 6 . (0) = E {(to — cr)' <3 (w — cr)(w — cr)1} . Equivalently, C (3)(0) - H pE [{vec(S - S)} ' 0 vec(5 - S){vec(5 - S)}'] (Hp 0 Hp)', (4.18) where w = vech S = H pYecS. Pre-multiplying (4.14) by Dp and post-multiplying by (Dp 0 Dp)1 and using (4.18) yield E [{vec(S - S)} ' 0 vec(5 - S){vec(5 - S)}'] = ^ JV P{E 0 (vec S )' 0 S} x (Np 3 N p). (4.19) Similarly, the fourth cumulant of vech S can be written as C^(O) = ( g p O # ,) E { vec(3 - S){vec(S - S)}' 0 vec(S - S){vec(S - S)}'} —2Np2 [E{vec(5 - S){vec(5 - S)}'} 0 Ejvec(S - S){vec(5 - S)}'}] — E{vec(S — S) 0 vec(S - S)} E{{vec(5 - S)} ' 0 {vec(5 - S)}'} x (H p 3 H p)1. (4.20) Pre-multiplying (4.15) by (Dp 3 Dp) and post-multiplying by (Dp 3 Dp)' and using (4.20) yields 97 E [vec(S - S){vec(5 - £ )} ' ® vec(5 - S){vec(S - S)}'] = 32 —N p2(Np ® jVp){5] ® (vec E )' vec E ® E }(-ZVp ® JVp) ■ +^(AZp ® AZpXJp ® -T(Pf) ® 4 ) ( S ® E ® E ® E)(AZp ® AZp) +2AZp2[E{vec(5 - E){vec(5 - E)}'} 0 E{vec(5 - E){yec(5 - E)}'}] ■ + E{vec(S - E) ® vec(5 - E)} E{{vec(5 - E)}' ® {vec(S' - E)}'}. (4.21) To compute the last part of (4.21), the second derivative of (t) with respect to t evaluated at * = 0 can be obtained as D2C ^ (t) _ 2 d t® d t n (Hp ® Hp) vec (E ® E ) . (4.22) Algebraically, d2C ^ ( t ) dt ® dt E{vec(S' — E) ® vec(5 - E)}. (4.23) Pre-multiplying (4.22) by (Dp ® Dp) and using (4.23) yields E{vec(S — E) ® vec(5 — S)} 2 (AZp ® AZp) vec (E ® E ) . (4.24) n Similarly, E [{vec(S - E)}7 0 {vec(5 - E)}'] = ^ {vec (E 0 E)},(AZP 0 AZp). (4.25) Substituting (4.24) and (4.25) in (4.21) yields E [ vec(£ - E){vec(S - E)}' 0 vec(S — E){vec(S - E)}'] — 32AZp2(Np ® AZp){E 0 (vec E )' 0 vec E 0 E}(AZP ® AZp)rv 98 +-g (A?, ® -Np) (ip # 8 ip) (S 8 Z 8 S 8 Z) % 8 TVp) g + — iVp2 (Np 8 JVp) (Z 8 Z 8 Z 8 Z) (Np 8 N p) + - ^ (N p 8 N p) vec(Z 8 Z){vec(Z 8 Z)}'(iVp 8 N p). (4.26) The results of the third and fourth order moments of vec(S — Z) are used for com­ puting the expectation of D(0-, 0) in (4.7). Expectation of LRT Statistic Partition 0 as 0 = (fi' tp')' with dimension (vi U2)', where u = ui + u2. In particular, k = 2. Denote the derivatives of vec Z with respect to 0 as follows: !Pd) - dvecZ _ A dvecZ S= I 5 f2s d2 vec Z A A 52 vec Z . . ^ - 0 0 '8 0 0 '" ^ 5Z 3% e 0 0 ; ® S= I t = i 5 z = ^ YeC^ l- — dvec (F(2\ p 2u, u ) ; 00 8 00 0^ vec Z k k k 03 vec Z S=I t — 1 U = I^ - 00' 8 0(9'8 6)0' " 6)0: e 0 0 :8 00 :/^ '" ® ® jp(ni) _ dvec (F^3\ p 2u2, u ) . The matrix E iyl, with dimension z/% x z/ for I < % < & is defined as Fiyl, — [Oyjxaj Jyj 0UiX{v—ai—Vi) ] • where a, = Y^j-i vj ~ vi- For convenience, define J 1O), j p (2) , f X^\ and F' ^ as F {1) = (Z "1 8 Z ”1) F (1); F (2) = (Z -1 8 Z " 1) F (2); l ( 1 ) (Z -1 8 vec Z - 1 8 Z "1) F ^ ; and F 2 = (Z-1 8 vec Z "1 8 Z ' 1) F® 99 Boik (2002a, in a supplement) derived the expectation of the right-hand-side of the equation (4.7) as follows: I E{D(9; 0)} = dim(O) 4- & + O M , (4.27) where Cl u {1) Cs Cs C4 L 1 Cs Ce Cr Cs U(% tr { (lg-1 p w 0 JVp) (Ip O v e cS o Ip) Iq 1Uw } ; (I^ 0 S -^ 0 S-^) - 2 ( f 0 lp ,) ; \ tr { ( F w I ' 1 O F w I q 1)' F (1)/ I q 1K ; ) ; 2 F {1)' (Fw O F w ) - dvec F (2), z/2, z/)' - ^ F w ' F (2); ^ tr [{iy" 0 JVp(S 0 S)} [ / ( ^ " [ 7 ^ ] ; L 1Iq 1L 1 + L 12 ( IM O I0-1) X2; if /W vec (F^) Iq 1) ; X2 = ^ vec { t / ^ (l„ O F « I 0" 1) } ; ^ t r {JV„ ( I ^ 1 0 1^1) ^ ( v e c ^ 1) ' ^ ' ^ 1^ vecj^-1; tr {JV„ (Iq1 O I g 1 F w ') U wiI g 1K ; ] ; ^ (vecI ^ 1)' vec ( F ^ I ^ 1) ; i tr { JV, ( I g 1 F w ' O I0- 1) U (2)} ; —6 dvec ^F*' ^ , p2zv, p2) F ^ + dvec dvec { (S 1O S ' -1) F ^ ) ,p V , '} ' ,P2v, P2. +2 (IP2 O F (1))' (Ip OvecS 1O Ip) (2 JVp + I(P)P))X3; 100 (s = l { v e c ( ^ ^ ) y u ( " ) ( v e c ^ ) ; Cio = ^ t r { N X ^ ® ^ ) ^ } + l ( v e c ^ ) ' ^ v e c i y \ and j q = 12 - 9 ^ - 2 dvec (f (1)/ F (3\ u2, z/2j - ^ F®' F {2). Under a Restricted Ha To compute the Bartlett-corrected test statistic in (4.8), the expectation in (4.27) is used twice to obtain the right-hand-side of (4.9). Under an Unrestricted Hffl To compute the Bartlett-corrected test statistic in (4.8), the expectation in (4.27) is used once to obtain E {-D(0O; 0o)|Hq j in (4.9). To obtain E ^D(0a] 0a)|Ho| , Boik (2002a) showed that if S is unrestricted under H0, then the asymptotic distribution of Q in (4.2) is X lAi where . Furthermore, b {5(»„;6„)|Ho} = + P (2P2 + to - 1J + o (n~2) . (4.28) To verify equation (4.28), it follows from (4.2) that B {D (0o;0o)|Ho} = 2 E { f ( 0 , ) - W } = 2E — In |5 | + - I n |S | | = 2E { - | I n |nS| + y ln(n) + | In |S |} . 101 Seber (1984) showed that InSfI ~ |S | HLi X n - i + i - Accordingly, p E 0a) |H0} = n p ln (n ) -n J ^ E ( ln X ^ _ i+1) . (4.29) i = l The expectation of ln(y), E{ln(y)}, assuming y ~ can be obtained by differenti­ ating both sides of r (D = I iexp{-% }j 2# % (4.30) with respect to n, where T(-) is the gamma function. The useful result concerning the digamma (-0) function is as follows: ainr(ra) _ r(w ) dn T(n) " Then, the result of differentiating both sides of (4.30) yields ^ G D = I V2 iexp{-^} 2 ir(% ) 2J dy — ln(2). Therefore, E{ln(y)} = -0 ( I ) + ln(2). Expand the digamma function to obtain ( I ) + ln(2) - ln(n) “ ^ “ 3 ^ + ' * ‘ (4-31) Applying result (4.31) to (4.29), simplification yields the result (4.28). 102 Matrix Expressions for Z i and K i To obtain matrix expressions for Z i and K i, the second and third derivatives of vec S with respect to O1 evaluated at jit = 0 are needed. The following properties are repeatedly used: {A BC ® D )E = [A ® {vec(C")}/ D]{vec(JB/) ® E} and (4.32) { A ® B C D )E = {A (g) vec(D') 0 B } (B 0 vec C), (4.33) where A, B , C, D, and E are any matrices that are conformable for multiplication. It follows from (3.4) that d2 vec S d^i' ® dy' 2N„ (TCA ® r ) d2 vec G dn' ® dii' I ® d vec G X ~ w ~ ) : Applying (4.33) yields d2 vec E dfi' ® dn' 2 Np (TC?A ® r) a2 vec G dii' ® dfi' +{T ® (vec A)' ® r} T 9 vec G 1M -Q ^ r - d vec G dfi' (4.34) Therefore, evaluated at /4 = 0, 92 vec E dfi' ® dfi' = 2 AL £1=0 (TA ® r ) + {r O (vec A)' ® r} (l(p,p) ® 103 Also, it follows from (3.4) that - “ 4 {,r»“ r|2 s ^ - > » , # '^ j(r(g ,(v ec jpy (g ,r) by (4.32). Therefore, evaluated at ^ = 0, d2 vec S d(p' dpi' At=O = 2 N v (r ® (vec I v)' ® r) ® B g }} . (4.35) The second derivative of vec S with respect to ip' can be written as d2 vec S dip' ® dip' ( r ® r ) X B f . At=O Similarly, the third derivative of vec S with respect to y>' can be written as d3 vec S dip' dy?' ® dyi' = ( T O T )T B f . At=O Next, it follows from (4.35) that d3 vec S dy ' O dy?' ® dp,' It follows from (4.34) that At=O d3 vec S dip' O dpi' O d/T = 2 AL 2 TVp {T ® (vec Jp)' O T} B f O B f I . d

y 0> ;4.38) Combining (4.37) and (4.38) yields 93 vec S <9y/ O 9/Li' O 9/Li' 2 AL H=O {r O (vec Jp)' 8 r} { & 8 } {(•+ <( (vec L ) o r o r X 8 ( jp 8 j(p,p) 8 jp) (D y ® Next, it follows from (4.34) that 93 vec S______________= 2 N 9/Li' O 9/Li' O 9/Li' p 9 9 - ( ( r G A ® r ) | l vecG 9 /i' O dfi' Zt 9 vecG _ 9 vecG \ (4.39) 9fi' [ ~ { r ® (vec A)- ® r } ™ 7 - ® - ^ r - J For convenience, the two parts on the right-hand-side of (4.39) are labeled as E c and Ed- Using the product rule and (4.33) and evaluating at /i = 0 yield #C = {r 8 (vec A)' 8 r} (j(p,p) J^ y 8 J^y) + (FA 8 T) G g ). (4.40) Using the Kronecker product rule, E a, evaluated at /Li = 0, simplifies to % = 2 iVp {r 8 (vec A)'8 r} ( j^ rg^ ry ) + I ( p 2,p2) O y ® J(p,p) G y ) ( I v 1 O I(vi , i>i)) 105 Using the following results repeatedly: [ A ® S ) — I(p,r ) { B ® A ) I ( Ctq), where A is r x c and B is p x q, yields E d = 2 N p I (r dp,' ® dfi' = 2 N n H=O (FA ® r) D i^ +{r ® (vec A)' ® T} x <1 (Z(Pf).Bg) ® B g )) + 2 (/(pf) Z)g) ® B g )) (Z ^® The other order derivatives can be rearranged by post-multiplying by commutation matrices. d2 vec S _ d2 vec S dG't ® OO1s = dO's ® BG1t livt^ d3 vec S _ ^3 vec S . . dG't ® OG1s ® _ dG '^ dG '^ dG '} 1^ ’^ ® Vu) _ S3 vecS (t ^ T \ The following properties are useful for simplifying expressions. F (1) = I M .FM F<2> = I m FW = F ^ I m f< n > = ( / „ « / , , ,„ ) I ’M 106 fW = I m .F<8> = F® = jr<«) = I(W)) = ® I v) F auj = V m »/#>) ^ llul = (J»- ® /(»,)) i ?(lll) ^ ^ ' ( E - 1 ® S " 1) = 2Ig F (1),(S ® S) F (1) = 2lg (S _1 ® S -1) vec S = vec S -1 d v e ^ = - ( 2 - i <8 D -i) F ^ = F « It follows from (4.4) that K i = 0. Then Zi = ^ F ^ ( S - 1Ig) S -^ v e c (S f-S ) 2 = ^ F (1),v e c (S -S ) . The second derivatives of £(8) can be written as 4 (9 ) = ^ FM '{ f^ < S ,(S -^ S ^ )v e c (S f -S ) } + n F (1)/ ( Ip(S)(VecIp)' S -1} ® vec(5 - S) j^ " F {1)' (S -1 Ig)S-1) F (1). Accordingly, ^ (0 ) = ^ ( ^ ' { ! ^ ( S ^ e S - ^ v e c ^ - S ) } -MF^)' ( s -1 ® (vec S -1)' ® S -1) (F ^ 8 vec(S - S)) " F {1)' ( S - 1Ig)S-1) F (1). 107 Taking the expectation of I2 (0) and dividing by n yields K 2 = (S _1 ® S - 1) F (1) = ~ F (1)/ F (1) = - I 0. Therefore, Z2 = ^ F (n > {Jv ® (S "1 ® S - 1) vec(5 - S ) } - Vn F ^ ' (F^1) ® vec(S - S ) } . Boik (2002a, in supplement) gave expressions for Z3, K 3, and K i as follows: Z3 = v ^ ( v e c F « ® F « ) (Jp z, ® J(p)P) /p) ( J iy ® F (1) ® vec(S - E)} + 2V n F ^ [F ^ ® 2Vp(]p ®vec S ' 1 ® ],)' {jX1) ®vec(3 - S )}] F ^11)' {J#: ® ( S ' 1 ® S ' 1) vec(3 - S )} - 2 Vn F ^ 1)' (Jv ® S ' 1 ® vec S"1 ® S"1)' { (Jv ® F ^ ) N v ® vec(g - S ) } - V n F ^ ( F ^ ® vec(5 — S)} ; K 3 - 2 F (1)' ( F ^ ® F ^ ) JVv - F (n > ^Jv ® F (1)) N u F ^ ' F ^ ; FT4 = F (11),{ lv® (S -1^vec S - 1 ® S - 1J7(F w ^ F w )J x {4(NV ® Jv) + 2 J(V3,V)} + 4 F^^' (F w ® F^)) j (Jv ® N v) + ^ J(v,v,) j - F (1)/ (S "1 ® vec S - 1 ® vec S " 1 ® S ”1) ( F (1) ® F (1) ® F (1)) x {2 J(VjV2) + 3 1(v2,v) +4(Nv ® I v)} - F (111)' ( JV2 ® F w J I (Jv ® N v) + ^ J(V,V3) j - F (11)' ( Jv ® F w J j (Nv ® Jv) + ^ J(V,,V) I - ^ F ^ )' F w ; where N v = \ ( I vi + J(v>v)). 108 CHAPTER 5 SIMULATION In this chapter, the performance of the likelihood ratio test and Bartlett correction (see Chapter 4) are evaluated and compared to competing tests. Simulation studies were conducted under 2 applications described in Chapter 2. Simulation Study of Variable Reduction (Redundancy) Tests The evaluation is based on test size and power of the test under different condi­ tions. The test statistics examined were o Likelihood ratio test (LRT) o Bartlett correction of LRT (BART) o Schott’s (1991) test (Tktq) in (1.16) o Bartlett correction of Tktq (T*) in (1.18) o Bartlett correction of Tktq (T**), where T** = ^ . An expression for c can be found in Schott (1991). The conditions under which the simulations were conducted are described in terms of the number of original variables (p), the number of retained components (k), the number of variables to be tested for redundancy (q), and the degrees of freedom of a Wishart distribution (n). The following conditions were considered: o p = 10, k = 2 and g = 1,4 109 o p = 10, k = A and 5 = 2,5 o p = 15, k = 2 and q = I 0 p = 15, k = A and q = 2. The degrees of freedom, n, varied from. 25 to 75 in the following manner, n E {20,25,30,35,50,75}. The nominal test size used was 0.05. For each simulation condition, Type I error probability was computed from 1000 data sets. The eigenvalues were ordered as follows: Ai < A2 < • • • < Ap. For simplicity, the covariance matrices examined were diagonal and have Ai = • • • = Ap_& = I. The Comparison Test Partition T as r = / X Fn q x m T2I ( m—q )Xm F3I (p—m ) x m Fi2 t?x(p-m) F22 (m —q)X (p—m) F32 (p—m)x(p—m) \ / Consider the special case where F3I = 0. In this section a test of hypothesis (1.21), where A = ^ ^ and M is the first m columns of I p, will be developed. Although this test is not useful in practice (because F3i = 0 is not likely to be satisfied), it is useful in improving the efficiency of the simulations by constructing an estimator of test size whose variance is less than or equal the usual estimator. Suppose that it is known that F3i = 0, A = f 9^ j , and M — ( J . Then, A e Tl(FM) ^ F 'A G Tl(M) HO Let y be an n x p matrix with distribution vec Y ~ N(0, S ® I n) . Then, S = TTp (n, E) and S = FAF' . Partition T as ( T1 T I. If it is known \ n x q n X ( m —q Y3 - ) n X (p—m) that F31 = 0, then A G K (TM ) <=> S A; OO A2 F i1 Fg1 o' o' T122 T132 Fn 0 F2I F22 0 F32 F11A1F11 F11A1Fg1 0 F21A1Fz11 F21A1Fz21 + F22A2Fz22 F22A2Fz32 0 F32A2F22 F32A2F32 T1 jl T3. Accordingly, a test of independence between T1 and Y3 is a test of H0: A G K(TM)- if F31 = 0, A , and M jlTH0 . Partition S as S = S12 S13 q x q gx(p-ro) S21 S22 S23 (ro-s)x? ( m—q ) x ( m —q) (m—g)x(p—m) S31 S32 S33 (p-m)Xg (p—m)x(m—g) (p—m)x(p—m) The test statistic is A1,, — S 11 S i3 S 3i S 33 ISfn||5 33| ’ which is distributed as Wilk’s A, where k = p — m: An approximate F- statistic (Rencher, 1995) is given by (I - A 1J t) dhF = (5.1) I l l which has an approximate F-distribution with Clf1 and d/2 degrees of freedom, where when qk = 2 then t is set equal to I. If min(g, k) is equal to either I or 2, then the F approximation in (5.1) has an exact F-distribution. Anderson (1984) gave the asymptotic distribution of M = - s log(A9)fejn_fe), where s = (n — k) - ~{q - k + 1). The error is of the order O (IV~6). Box (1949) provided the coefficients to be used in term of O (N~6). The cdf of M can be written as dfi = qk, P(M < mo) — Pqk + ^2 (Pqk+4: ~ Pqk) 74 = Y + 19^0 {394 + 3A:4 + IOg2A:2 - 50 (g2 + k2) + 159} , and = f + ™ { (g6 + A;6) — 105 (g4 + fc4) + 1, 113 («2 + P ) + (2Ig2 - 350 + 21&2) q2k2 - 2,995}. Te 112 Variance Reduction Procedure Consider two test statistics, namely the target test, L, and the comparison test, F. Suppose there are two outcomes of testing any hypothesis: rejection of the null hypothesis denoted by I and failure to reject the null hypothesis denoted by 0. Let riij denote the number of observations that belong to the outcomes in the ith row and the j th column of the L and F tests. The 2 x 2 table of the number of observations can be written as follows. F I 0 Total I n n M l2 M i. 0 M2I M22 M2. Total n . i n .2 N Let Pij be the probability of an observation in the ith row and the j th column. Let a and a* be the sizes of the F and L tests, respectively. Because the comparison test is exact (or nearly so), it is known that E(n.i) = aN, whereas the expectation of rii. is unknown. The likelihood function, conditional on n.i, can be written as L(PK1) = ( ^ n.2 nn P22 1 — a n.2—ni2 P22 I — a ni2 5 where a = Pu + P21 and I — a = Pi2 + T22. 113 Then, the loglikelihood function, conditional on n.i, is = nn In ^ ^ + (n.i - nn) In ^ l — + (n.2 — ni2) Ir + n i2 In I PtI — a + constants. The MLE of Pu, Pu, can be obtained by solving af(P|n.i) 0. Therefore, A r = n.i Similarly, by solving = 0, the MLE of Pi2, Pi2 is Pu = ^ ( 1 - a). 71.2 Accordingly, an estimator of a*, say a, is a = Pu + Pi2 = — a + — (I - a), n.i n.2 The expectation and variance of a, conditional on n.i, are E (oi|n.i) = Pu + P l2 = a* and var (a|n.i) = - P u ( I - — ^ + -— -P i2 I I -n.i n.2 Pr I — CK Accordingly, P22 I — CK var ( ck) = E [var (a |n .i)]. 114 Because n.i ~ B(N, a) and n.2 ~ B(N, I — a), then using E K 1) E (n^1) aN and (I - a)N yields var (a) « 01 01 ^ (l - (p2) , where a* = P11 + P12 and 1P n.xn.2ni.n2. Let d = % be the usual estimator of a*. Then, var (a) a* (I - a*)N The relative efficiency of a to d is given by var (d) _ var (a) (I — (p2) Therefore, the effective sample size, N*, is N ( i - y f ) ' All computations were executed using Matlab programs. Details including the programming code are given in Appendices. The test statistic T* in (1.18) was pro­ grammed using Schott’s (1991) expression for c. However, the value of c, using the data from Example I, is not identical to his result for the same data set and this 115 leads to different value of T*. The program was checked thoroughly and no errors were found. Professor Schott was asked about the discrepancy and he responded that he have been unable to locate his code and thus this thesis should proceed under the assumption that the program used is correct. The following tables report the results of simulations. A cell in each table presents the following quantities: for a test with nominal a — 0.05. / ' 116 Table 5. Probabilities of Type I error when (p, k, q) = (10, 2, 4) and A = (lg, 10, 25)'. n LRT BART F Tk,q T* rp** 20 0.0610 0.0440 0.0470 0.2690 0.0700 0.1170 0.0625 0.0452 0.2713 0.0728 0.1198 1263 1205 1154 2479 1592 30 0.0820 0.0540 0.0590 0.1780 0.0750 0.0970 0.0763 0.0497 0.1701 0.0662 0.0884 1420 1339 1407 4410 2402 50 0.0560 0.0480 0.0520 0.1100 0.0580 0.0620 0.0546 0.0468 0.1081 0.0560 0.0600 1895 1629 1797 9164 5877 75 0.0510 0.0430 0.0450 0.0860 0.0560 0.0590 0.0546 0.0465 0.0908 0.0609 0.0639 1829 2045 2003 4861 4024 Table 6. Probabilities of Type I error when (p, k, q) — (10,2,4) and A = (lg, 5,10)'. n LRT BART F Tk,q rj-i* rji** 20 0.0680 0.0410 0.0470 0.3050 0.0470 0.1190 0.0695 0.0421 0.3072 0.0491 0.1216 1201 1198 1126 1896 1494 30 0.0750 0.0490 0.0590 0.2050 0.0610 0.0990 0.0694 0.0441 0.1974 0.0535 0.0907 1450 1539 1321 3096 2120 50 0.0570 0.0460 0.0520 0.1170 0.0640 0.0730 0.0556 0.0447 0.1151 0.0621 0.0710 1863 1966 1706 4340 3295 75 0.0560 0.0490 0.0510 0.0960 0.0620 0.0630 0.0553 0.0483 0.0950 0.0611 0.0621 1837 1881 2024 3944 3756 117 Table 7. Probabilities of Type I error when (p, k, q) = (10,2,4) and A = (1'8, 2,10)'. n LRT BART F Tk,q T* n^** 20 0.1520 0.0460 0.0550 0.5460 0.0320 0.1920 0.1495 0.0441 0.5437 0.0311 0.1894 1108 1199 1046 1056 1103 30 0.1200 0.0490 0.0580 0.4140 0.0370 0.1530 0.1156 0.0458 0.4093 0.0351 0.1481 1185 1238 1083 1092 1183 50 0.0910 0.0520 0.0520 0.2630 0.0550 0.1080 0.0898 0.0511 0.2614 0.0541 0.1066 1289 1257 1181 1236 1306 75 0.0670 0.0420 0.0450 0.1760 0.0580 0.0770 0.0703 0.0447 0.1802 0.0610 0.0808 1411 1446 1264 1378 1530 Table 8. Probabilities of Type I error when (p, k, q) — (10,2,4) and A = (1'8, 1.5,10)'. n LRT BART F Tk,q P^* 20 0.1690 0.0390 0.0470 0.6550 0.0400 0.2530 0.1701 0.0395 0.6560 0.0401 0.2540 1048 1031 1023 1001 1027 30 0.1560 0.0490 0.0440 0.5680 0.0620 0.2200 0.1590 0.0513 0.5706 0.0630 0.2226 1087 1146 1032 1022 1048 50 0.1350 0.0450 0.0470 0.4500 0.0750 0.1860 0.1366 0.0457 0.4516 0.0758 0.1873 1119 1065 1054 1052 1057 75 0.1160 0.0450 0.0440 0.3860 0.0740 0.1400 0.1194 0.0474 0.3893 0.0762 0.1427 1152 1190 1056 1094 1075 118 Table 9. Probabilities of Type I error when (p, k, q) — (1 0 ,2 ,1) and A = (lg, 10,25)' n LRT BART F Tk,q T* rji** 20 0.0660 0.0480 0.0470 0.0870 0.0600 0.0650 0.0681 0.0498 0.0899 0.0630 0.0679 1528 1537 2072 4398 3441 30 0.0640 0.0530 0.0480 0.0760 0.0540 0.0560 0.0655 0.0545 0.0779 0.0560 0.0580 1804 1953 2584 8568 6664 50 0.0530 0.0500 0.0510 0.0680 0.0560 0.0560 0.0522 0.0492 0.0670 0.0550 0.0550 2340 2369 3796 7614 7614 75 0.0520 0.0500 0.0500 0.0580 0.0520 0.0520 0.0520 0.0500 0.0580 0.0520 0.0520 2967 3241 6887 24700 24700 Table 10. Probabilities of Type I error when (p, &, q) = (10, 2 ,1) and A = (lg, 5,10)' n LRT BART F Tk,q 20 0.0590 0.0400 0.0530 0.0860 0.0580 0.0620 0.0575 0.0388 0.0833 0.0555 0.0595 1316 1258 2098 2785 2476 30 0.0650 0.0550 0.0570 0.0700 0.0540 0.0540 0.0598 0.0502 0.0635 0.0480 0.0480 1988 1947 3505 4310 4310 50 0.0560 0.0540 0.0490 0.0650 0.0580 0.0580 0.0568 0.0548 0.0660 0.0590 0.0590 2670 2617 3863 5024 5024 75 0.0560 0.0490 0.0520 0.0580 0.0560 0.0560 0.0542 0.0474 0.0561 0.0541 0.0541 3935 2806 6890 6704 6704 119 Table 11. Probabilities of Type I error when (p, k, q) = (10, 2 ,1) and A = (lg, 2,10)' n LRT BART F Tk,q T* rp** 20 0.0970 0.0560 0.0530 0.1250 0.0730 0.0790 0.0960 0.0552 0.1231 0.0717 0.0775 1069 1080 1236 1168 1202 30 0.0890 0.0580 0.0470 0.1020 0.0660 0.0710 0.0909 0.0594 0.1044 0.0675 0.0727 1270 1226 1432 1234 1286 50 0.0790 0.0540 0.0560 0.0780 0.0710 0.0720 0.0751 0.0508 0.0727 0.0665 0.0674 1451 1408 2336 1849 1905 75 0.0570 0.0480 0.0510 0.0630 0.0560 0.0580 0.0563 0.0473 0.0622 0.0553 0.0572 2003 1919 2393 2043 2208 Table 12. Probabilities of Type I error when (p, k, q) = (10, 2 ,1) and A = (lg, 1.5,10)' n LRT BART F Tk,q T* 20 0.1210 0.0710 0.0520 0.1520 0.0850 0.0940 0.1204 0.0705 0.1509 0.0844 0.0934 1049 1040 1130 1050 1057 30 0.1220 0.0700 0.0530 0.1270 0.0880 0.1050 0.1205 0.0686 0.1252 0.0870 0.1038 1138 1199 1197 1071 1087 50 0.1080 0.0740 0.0490 0.1220 0.0920 0.089 0.1086 0.0745 0.1226 0.0924 0.0894 1224 1206 1203 1085 1101 75 0.0910 0.0640 0.0400 0.0920 0.0670 0.0660 0.0971 0.0698 0.0989 0.0728 0.0716 1207 1280 1275 1261 1238 120 Table 13. Probabilities of Type I error when ( p , k , q ) = (10,4,2) and A (1^,10,10,10,25)'. n LET BART F Tk,q T* 20 0.1350 0.0610 0.0480 0.2910 0.0700 0.1240 0.1362 0.0620 0.2925 0.0718 0.1258 1151 1228 1140 2397 1553 30 0.0870 0.0530 0.0430 0.1850 0.0550 0.0750 0.0920 0.0570 0.1910 0.0614 0.0818 1354 1374 1246 2960 2243 50 0.0710 0.0400 0.0390 0.1150 0.0500 0.0580 0.0787 0.0460 0.1251 0.0606 0.0688 1385 1409 1454 3703 2933 75 0.0640 0.0450 0.0400 0.0920 0.0490 0.0510 0.0717 0.0521 0.1015 0.0586 0.0609 1604 1812 1698 4286 4450 Table 14. Probabilities of Type I error when (p,k,q) = (10,4,2) and A ( I ', 5,5,10,10)'. n LRT BART F Tk tq T* rj-*** 20 0.1620 0.0630 0.0480 0.3380 0.0620 0.1400 0.1631 0.0639 0.3394 0.0635 0.1418 1124 1173 1109 1857 1418 30 0.0950 0.0490 0.0430 0.2130 0.0510 0.0810 0.0999 0.0522 0.2188 0.0566 0.0870 1309 1228 1199 2175 1700 50 0.0710 0.0460 0.0390 0.1290 0.0540 0.0670 0.0784 0.0519 0.1390 0.0639 0.0774 1346 1330 1377 2500 2148 75 0.0650 0.0490 0.0400 0.1000 0.0510 0.0530 0.0727 0.0558 0.1094 0.0606 0.0626 1588 1610 1600 3773 3398 121 Table 15. Probabilities of Type I error when (p, k, q) = (10,4,2) and A (T6, 2,5,10,10)'. n LRT BART F Tk,q T* 20 0.2140 0.0650 0.0440 0.4570 0.0820 0.1810 0.2171 0.0670 0.4603 0.0842 0.1844 1070 1084 1052 1080 1102 30 0.1340 0.0550 0.0460 0.3230 0.0630 0.1170 0.1365 0.0568 0.3258 0.0648 0.1199 1179 1199 1112 1184 1282 50 0.0820 0.0430 0.0390 0.1870 0.0580 0.0890 0.0884 0.0478 0.1963 0.0644 0.0965 1202 1208 1214 1300 1273 75 0.0850 0.0590 0.0500 0.1520 0.0700 0.0840 0.0850 0.0590 0.1520 0.0700 0.0840 1344 1520 1415 2018 1764 Probabilities of Type I error when (p, k,q) = (10,4, 10)'. n LET BART F Tk,q T* ji** 20 0.2320 0.0650 0.0470 0.5070 0.0760 0.1980 0.2336 0.0661 0.5085 0.0769 0.1994 1078 1103 1045 1060 1064 30 0.1610 0.0640 0.0450 0.3940 0.0890 0.1570 0.1642 0.0661 0.3972 0.0911 0.1599 1153 1146 1078 1102 1122 50 0.1260 0.0590 0.0630 0.3280 0.1070 0.1420 0.1164 0.0517 0.3192 0.1020 0.1366 1425 1510 1151 1106 1095 75 0.0940 0.0500 0.0400 0.2190 0.0750 0.1070 0.1011 0.0547 0.2269 0.0786 0.1124 1293 1216 1161 1079 1132 122 Table 17. Probabilities of Type I error when (p, k, q) = (10,4,5) and A (T6, 10,10,10,25)'. n LRT BART F Tk,q T* n^** 20 0.1560 0.0740 0.0800 0.7100 0.0840 0.3190 0.1370 0.0593 0.7005 0.0590 0.2968 1287 1347 1036 2968 1227 30 0.1120 0.0600 0.0650 0.4460 0.0890 0.1770 0.1019 0.0518 0.4371 0.0746 0.1638 1378 1469 1094 3202 1477 50 0.0750 0.0550 0.0500 0.2470 0.0660 0.0890 0.0750 0.0550 0.2470 0.0660 0.0890 1605 1905 1191 3918 2167 75 0.0660 0.0510 0.0430 0.1750 0.0570 0.0700 0.0720 0.0568 0.1810 0.0639 0.0768 1950 2346 1268 3896 2481 Table 18. Probabilities of Type I error when (p,k,q) = (10,4,5) and A (%, 5,5,10,10)'. n LRT BART F Tk,q T* rji** 20 0.1560 0.0460 0.0710 0.7980 0.0410 0.3500 0.1433 0.0397 0.7934 0.0333 0.3356 1224 1155 1019 1288 1157 30 0.1170 0.0560 0.0640 0.5180 0.0670 0.1880 0.1078 0.0489 0.5108 0.0568 0.1759 1337 1412 1067 2041 1419 50 0.0760 0.0500 0.0500 0.2890 0.0650 0.0980 0.0760 0.0500 0.2890 0.0650 0.0980 1591 1701 1148 2967 1939 75 0.0620 0.0480 0.0440 0.2070 0.0590 0.0730 0.0670 0.0525 0.2120 0.0649 0.0788 2034 2123 1214 3760 2406 123 Table 19. Probabilities of Type I error when ( p , k , q ) — (10,4,5) and A ( I z6, 2 ,5 ,10 ,10 )'. n LRT BART F Tk,q T* J - I * * 20 0.1800 0.0340 0.0790 0.9010 0.0460 0.4900 0.1637 0.0271 0.8979 0.0387 0.4763 1183 1143 1009 1118 1069 30 0.1340 0.0430 0.0640 0.7620 0.0730 0.3220 0.1243 0.0385 0.7584 0.0680 0.3123 1327 1176 1021 1126 1150 50 0.0930 0.0460 0.0500 0.4940 0.0740 0.1590 0.0930 0.0460 0.4940 0.0740 0.1590 1492 1413 1057 1430 1295 75 0.0690 0.0470 0.0460 0.3230 0.0780 0.1200 0.0724 0.0500 0.3258 0.0810 0.1233 1927 2215 1112 1547 1402 Probabilities of Type I error when (p, k, q) = (10,4, 10)'. n LRT BART F Tk,q T* r j p * * 20 0.2020 0.0250 0.0630 0.9450 0.0560 0.5740 0.1936 0.0220 0.9442 0.0541 0.5694 1182 1143 1003 1023 1031 30 0.1620 0.0260 0.0680 0.8490 0.0990 0.4540 0.1515 0.0214 0.8461 0.0955 0.4454 1189 1196 1013 1027 1061 50 0.1080 0.0220 0.0490 0.7180 0.1260 0.3170 0.1087 0.0223 0.7183 0.1264 0.3175 1338 1285 1020 1066 1057 75 0.1040 0.0420 0.0520 0.5620 0.1330 0.2540 0.1022 0.0408 0.5611 0.1321 0.2527 1763 1720 1044 1103 1121 124 Table 21. Probabilities of Type I error when (p, k, q) = (15 ,2 ,1) and A = (Iz13, 15,35)' n LRT BART F Tk,q T* n^** 25 0.0610 0.0470 0.0580 0.0650 0.0450 0.0470 0.0606 0.0468 0.0647 0.0455 0.0465 1001 1000 1001 1004 1004 35 0.0490 0.0390 0.0520 0.0590 0.0410 0.0420 0.0489 0.0389 0.0580 0.0409 0.0419 1002 1004 1001 1001 1001 50 0.0380 0.0310 0.0460 0.0500 0.0370 0.0370 0.0381 0.0311 0.0503 0.0373 0.0373 1001 1000 1006 1007 1007 75 0.0400 0.0370 0.0560 0.0410 0.0380 0.0380 0.0396 0.0366 0.0405 0.0374 0.0374 1007 1008 1010 1012 1012 Table 22. Probabilities of Type I error when (p, k, q) = (15,2,1) and A = (1'13, 10,15)' n LRT BART F Tk,q T* J-*** 25 0.0620 0.0460 0.0580 0.0610 0.0480 0.0490 0.0616 0.0458 0.0606 0.0475 0.0485 1001 1000 1001 1004 1003 35 0.0470 0.0380 0.0520 0.0580 0.0440 0.0450 0.0469 0.0370 0.0579 0.0439 0.0449 1003 1005 1001 1001 1001 50 0.0370 0.0310 0.0460 0.0480 0.0380 0.0380 0.0461 0.0394 0.0618 0.0520 0.0520 1001 1001 1004 1006 1006 75 0.0560 0.0480 0.0460 0.0680 0.0600 0.0600 0.0563 0.0482 0.0684 0.0602 0.0602 1005 1001 1005 1002 1002 125 Table 23. Probabilities of Type I error when (p, k, q) — (15, 2 ,1) and A = (Iz13, 2,15)'. (1'13,1.5,15)'. n LRT BART F Tk,q T* J-*** 25 0.1000 0.0740 0.0580 0.0830 0.0630 0.0650 0.0995 0.0735 0.0822 0.0625 0.0644 1002 1003 1006 1003 1005 35 0.0800 0.0570 0.0520 0.0720 0.0540 0.0570 0.0798 0.0568 0.0719 0.0539 0.0569 1009 1013 1000 1001 1003 50 0.0900 0.0650 0.0450 0.0670 0.0640 0.0660 0.0901 0.0650 0.0668 0.0638 0.0658 1000 1000 1001 1001 1001 75 0.0660 0.0550 0.0560 0.0470 0.0400 0.0410 0.0655 0.0548 0.0466 0.0397 0.0407 1005 1001 1004 1003 1003 Probabilities of Type I error when (p, k, q) = (15,2, n LRT BART F Tk,q rj-*** 25 0.1230 0.0850 0.0580 0.0890 0.0650 0.0710 0.1221 0.0844 0.0881 0.0645 0.0701 1005 1003 1007 1003 1009 35 0.1000 0.0620 0.0520 0.0910 0.0730 0.0790 0.0998 0.0618 0.0909 0.0728 0.0788 1007 1008 1002 1005 1004 50 0.1060 0.0780 0.0530 0.0860 0.0860 0.0820 0.1059 0.0778 0.0859 0.0860 0.0820 1001 1002 1001 1000 1000 75 0.0930 0.0640 0.0560 0.0740 0.0620 0.0700 0.0923 0.0635 0.0736 0.0616 0.0694 1007 1006 1004 1004 1007 126 Table 25. Probabilities of Type I error when ( p , k , q ) = (15,4,2) and A (I ' , , 15,15,15,35)'. n LRT BART F Tk,q T* J-*** 25 0.1360 0.0700 0.1520 0.2080 0.0690 0.0930 0.1247 0.0626 0.1919 0.0623 0.0820 1013 1010 1020 1008 1018 35 0.1050 0.0640 0.0830 0.1440 0.0710 0.0860 0.1005 0.0602 0.1379 0.0675 0.0809 1015 1016 1021 1013 1024 50 0.0740 0.0440 0.0660 0.0113 0.0580 0.0620 0.0722 0.0419 0.0110 0.0564 0.0602 1012 1025 1009 1011 1013 75 0.0770 0.0630 0.0640 0.0910 0.0610 0.0620 0.0744 0.0604 0.0882 0.0586 0.0597 1024 1035 1030 1030 1029 Table 26. Probabilities of Type I error when (p,k,q) = (15,4,2) and A (T11, 10,10,15,15)'. n LRT BART F Tk,q T* rji** 25 0.1350 0.0710 0.1520 0.2100 0.0710 0.0940 0.1243 0.0629 0.1933 0.0637 0.0839 1012 1012 1021 1010 1014 35 0.1060 0.0680 0.0830 0.1530 0.0710 0.0820 0.1016 0.0639 0.1468 0.0671 0.0776 1014 1018 1021 1016 1018 50 0.0750 0.0440 0.0660 0.1120 0.0580 0.0630 0.0729 0.0419 0.1100 0.0564 0.0612 1015 1025 1009 1011 1013 75 0.0740 0.0620 0.0640 0.0880 0.0610 0.0610 0.0714 0.0594 0.0853 0.0586 0.0586 1031 1036 1027 1030 1030 127 Table 27. Probabilities of Type I error when ( p , k , q ) = (15,4,2) and A (In , 2,10,15,15)'. n LRT BART F Tk,q T* rji** 25 0.1970 0.0990 0.1520 0.2950 0.0550 0.1110 0.1843 0.0880 0.2830 0.0490 0.1014 1012 1017 1008 1008 1011 35 0.1450 0.0880 0.0830 0.2320 0.0540 0.0820 0.1407 0.0851 0.2273 0.0520 0.0784 1010 1007 1008 1005 1011 50 0.1030 0.0590 0.0660 0.1650 0.0510 0.0740 0.1009 0.0572 0.1619 0.0493 0.0724 1012 1015 1071 1015 1008 75 0.0960 0.0600 0.0640 0.1240 0.0650 0.0690 0.0937 0.0580 0.1209 0.0636 0.0672 1019 1020 1026 1009 1015 Probabilities of Type I error when (p,k,q) = (15,4, 5,15)'. n LRT BART F Tk,q T* 25 0.2290 0.1160 0.1520 0.3460 0.0650 0.1300 0.2162 0.1054 0.3386 0.0617 0.1227 1011 1013 1003 1002 1005 35 0.1790 0.0870 0.0830 0.2970 0.0660 0.1080 0.1746 0.0854 0.2912 0.0653 0.1054 1009 1002 1011 1000 1005 50 0.1230 0.0700 0.0660 0.2340 0.0760 0.0910 0.1210 0.0683 0.2323 0.0752 0.0894 1009 1010 1003 1002 1007 75 0.1240 0.0780 0.0640 0.1940 0.0750 0.0980 0.1216 0.0761 0.1911 0.0743 0.0967 1015 1015 1017 1002 1006 128 The following tables report power of the tests when p = 10, k — 2 ,4, and q = 1,2. Figues compare the power of the BART and T* tests both of which are controlled for Type I error. Figure 3. Power of test when A = (lg, 10,25)', k = 2, q = I. BART - 0 - Tstar Table 29. Power of test when A = (1'8, 10,25)', k — 2, q — l. Power of test n LRT BART Tk,q rji* 25 0.3330 0.2980 0.3550 0.2880 0.2960 50 0.5890 0.5570 0.5720 0.5390 0.5410 75 0.7480 0.7410 0.7540 0.7290 0.7300 100 0.8580 0.8510 0.8540 0.8450 0.8450 150 0.9620 0.9620 0.9640 0.9640 0.9640 129 Figure 4. Power of test when A = (Ix8, 2,10)', k — 2, q = I. BART -Q - Tslar Table 30. Power of test when A = (18, 2,10)', k — 2, q — I. Power of test n LRT BART Tk,q rji** 25 0.3150 0.2410 0.3200 0.1710 0.2000 50 0.4580 0.4050 0.4620 0.3230 0.3640 75 0.5960 0.5580 0.5870 0.4940 0.5180 100 0.6930 0.6540 0.6610 0.6030 0.6240 150 0.8750 0.8600 0.8660 0.8490 0.8530 130 Figure 5. Power of test when A = (lg, 10,10,10,25)', A; = 4, g = 2. BART Tstar Table 31. Power of test when A = (1'6, 10,10,10,25)', k = A, q = 2. Power of test n LRT BART Tk,q J-*** 25 0.2750 0.1690 0.4130 0.1680 0.2330 50 0.3420 0.2700 0.4150 0.2710 0.2940 75 0.4990 0.4360 0.5370 0.4470 0.4600 100 0.6130 0.5820 0.6360 0.5780 0.5800 150 0.8230 0.8070 0.8230 0.8010 0.8040 131 Figure 6. Power of test when A = (lg, 2, 5,10,10)', = 4, q = 2. Q- 0.4 BART -Q - Tstar Table 32. Power of test when A = (lg, 2,5,10,10)', k — 4, q = 2. Power of test n LET BART Tk,q rji* rJi** 50 0.2900 0.1940 0.4070 0.1680 0.2200 75 0.3770 0.3010 0.4570 0.2550 0.2870 100 0.4840 0.4300 0.5320 0.3900 0.4160 150 0.6240 0.5840 0.6510 0.5590 0.5710 175 0.7020 0.6720 0.7160 0.6620 0.6670 200 0.7890 0.7650 0.8000 0.7530 0.7550 132 Discussion The results of the simulation study of variable reduction test reported in Table 5 - Table 28 show that the likelihood ratio test (LRT) can be improved using a Bartlett correction, especially when is quite small (see Tables 7, 8, 11, 12, 15, 16, 19, 20, 23, 24, 27, and 28). In these cases, the LRT becomes liberal whereas the BART is still conservative. With a few exceptions in Tables 8 and 19-22, the BART gives probabilities of Type I error between 0.04 to 0.10. In general, the LRT performs better than Tktqi especially when k + q is large, say greater than or equal to |p . In these cases, the LRT continues to perform well whereas Tktq becomes liberal (see Tables 17-20). The Tktq test gives probabilities of Type I error up to 0.945 (see Table 20). The T** test does not perform nearly as well as T* and BART unless k + q is small (see Tables 9-12 and 21-24) or the sample size is quite large. Under most conditions, the BART results are better than T*. The effective sample size, N*, is larger as the sample size is larger. The comparison test is useful when q is not far from m = p — k. If q is far from m, N* is close to N and the relative efficiency is almost I (see Tables 21-28). With respect to the power of test, it is sufficient to compare the BART with the T* both of which are controlled for Type I error. The results are reported in Figure 3 - Figure 6. Only a few cases were performed. These figures show that the Bartlett-corrected test has a better power than T*, especially when is small (see Figures 4 and 6). 133 Simulation Study of Component Reduction Tests The evaluation is based on the percentage coverage of confidence interval for under different conditions. The intervals examined are o Likelihood ratio interval (LRI) o Bartlett-corrected interval (BARI) o Anderson’s interval (ZI) by inverting from (1.6). To obtain LR I and BAR I, the confidence interval of can be constructed by inverting the LRT and BART of H0: = C0 against H0: ^ C0. The interval consists of all values C0 for which H0: = Co cannot be rejected. That is, a (I — ck) 100% Cl for can be obtained as where Q is the LRT or BART statistic. In this simulation study, the LR I and BAR I are obtained by applying the fmin- search command in Matlab program (see Matlab subprogram in Appendices). To minimize the quantity (p-value — a)2, this subprogram finds a local minimizer cq. The interval (c0i, C02) is found such that the p-values for testing Ho: = cqi and Ho: = C02 are each equal to a. The endpoints of Anderson’s intervals are used as initial guesses. The number of original variables, p, was 8. The number of retained components, k, was 2 and 4. For each simulation condition, 1000 data sets were generated. For 134 simplicity, the covariance matrices examined were diagonal and have Ai = • • • = \ p-k = I and values for the remaining eigenvalue are listed in the tables below. The vector Ci was chosen so that are lIiT-L, \ The nominal test size used was 0.05. Tables with the percentage coverage of confidence interval for are shown below. Table 33. Percentage coverage of Cl when (p, k) = (8,2) and A = (Iz6, 10,25)'. n LR I BAR I Z I 20 90.70 92.40 88.90 75 94.10 94.90 93.90 Table 34. Percentage coverage of Cl when (p, k) = (8,2) and A = (1'6, 2,10)'. n LR I BAR I Z I 20 84.70 90.10 84.90 75 92.30 93.50 92.60 Table 35. Percentage coverage of Cl when (p, k) = (8,4) and A = (Iz4, 10,10,10,25)z. n LR I B AR I Z I 20 78.80 93.90 72.50 75 90.40 94.90 89.90 Table 36. Percentage coverage of Cl when (p, k) = (8,4) and A = (14, 2,10,10,10)z. n LR I BAR I Z I 20 62.30 91.00 60.10 75 87.40 93.60 87.40 The results of the simulation of component reduction reported in Table 33 - Table 36 suggest that the Bartlett-corrected intervals perform better than the other intervals. The percentage coverage of Cl for of the Bartlett-corrected intervals 135 are closer to 95% than that of the likelihood ratio intervals and Anderson’s intervals in all conditions. Numerical Examples Example I. (continued) (The study of the quality of pictures) In Chapter I, the following two questions were addressed. 1. How many components should be retained? 2. How many and which variables can be considered as redundant? To answer these questions, a confidence interval for was computed, where Ci = (0 0 0 0 0 0 0 1 I)'. That is the Cl for the cumulative proportion of total variance accounted for by the first two components. The 95% confidence interval is (0.7422, 0.7690). It can be concluded that the first two components are sufficient to explain the variability. The focus, now, is on the first two components. From Table 2, the first two components have small coefficients on the last three variables. The question is whether these three variables can be eliminated. Consider hypothesis (1.21), where k = 2, q = 3 and n = 108. That is, the last three variables have zero coefficients in each of the first two components. The likelihood ratio test statistic is 3.36 which asymptotically follows a and the corresponding p-value is 0.7613. The hypothesis could not be rejected at either the 0.01 or 0.05 significant levels. Then, these three variables are 136 redundant and can be discarded. The Bartlett-corrected test statistic is 3.41 and the corresponding p-value is 0.7549. This leads to the same conclusion. Example 2. (continued) (Women’s track records) In Chapter I, two questions were addressed: how many components should be retained and which polynomials contribute to the largest com­ ponent? To answer these questions, the confidence interval for was computed, where Ci = (0 0 0 0 I I)'. The 95% Cl for the cumulative proportion of total variance accounted for bp the first two components is (0.7823, 0.8282). Therefore, the first two components should be retained. To reduce the dimension in the number of variables, hypothesis (1.21) was exam­ ined, where k = 1, 2 and q = I, 2, 3, 4, 5. That is, the first k components have zero loadings on the last q variables. Table 37. P-value of Testing Hypothesis: Women’s Track Example. k Q LRT p-value BART p-value I I 0.5877 0.4433 0.6082 0.4355 I 2 2.5382 0.2811 2.5549 0.2787 I 3 8.7912 0.0322* 8.5538 0.0359* I 4 9.7598 0.0447* 9.3310 0.0533 I 5 14.8112 0.0112* 13.9096 0.0162* 2 I 0.6040 0.7393 0.5991 0.7411 2 2 8.5104 0.0746 8.2538 0.0827 2 3 31.3926 <0.0001** 30.1616 <0.0001** Note:* denote rejection of Ho at 0.05 ** denote rejection of H0 at 0.01 For each of the hypotheses, the likelihood ratio test statistics, Bartlett-corrected test statistics which asymptotically follow a and the corresponding p-values are presented in Table 37. Failure to reject H0 means that, in the first k components, all variability in speed over track events is related to the first p — q polynomials. The 137 last q polynomials are redundant. 138 CHAPTER 6 CONCLUSION In this thesis, new procedures for examining dimension reduction in principal component analysis were developed in Chapters 2, 3 and 4. The Bartlett-corrected test of redundancy appears to be promising. The tabled results show an improvement in Type I error control for the Bartlett-corrected test when compared to the likelihood ratio test and other competing tests. Figures 3-6 show that the Bartlett-corrected test has as good or better power than the competing test in all conditions that were examined. The Bartlett-corrected test also appears to be promising in respect to the percent­ age coverage of confidence interval for Results show that the Bartlett-corrected intervals substantially improve the control over the percentage coverage of Cl for in these conditions. There are some limitations of this thesis. First, the procedures for estimating MLEs, for constructing LRTs, and for evaluating Bartlett corrections are derived un­ der normality. The results do not apply to non-normal data. Second, the proposed model is based on the covariance matrix. The model does not apply to the correlation matrix. A limitation of the procedures is computational intensity. The computation of MLEs and the construction of LRTs are fast, but to compute the Bartlett cor­ rection, the procedure is computationally intensive. Computer time increases as p 139 increases. This may not be noticeable when computing on just one data set. How­ ever, a simulation study may take a long time. Efficient programming is needed to shorten the computer time. For future research, it would be useful to construct one-sided confidence intervals for Deriving Edgeworth approximations and saddlepoint approximations for the distribution of estimators of eigenvectors and eigenvalues also could be an interesting topic. To shorten the computer time, algebraic simplification of Bartlett equations could be done. Under non-normality, adopting robust methods such as M estimators could be an important extension. Another interesting topic would be to examine dimension reduction in a correlation matrix. Finally, it would be useful to develop guidelines for a general set of methods for exploratory and confirmatory principal component analysis. Especially, in confirma­ tory PCA consisting of several types of hypothesis testings, the future research could be driven by the demands of applications. This present research is one part of the confirmatory PCA. 140 REFERENCES [1] Anderson, T.W. (1963). Asymptotic theory for principal component analysis. Annals of Mathematical Statistics, 10, 203-224. [2] Anderson, T.W. (1984). Introduction to Multivariate Statistical Analysis, Second edition. New York: Wiley. [3] Bartlett, M.S. (1954). A note on the multiplying factors for various %2 approxi­ mations. Journal of the Royal Statistical Society, Series B, 16, 296-298. [4] Beale, E.M.L, Kendall, M.G. and Mann, D.W. (1967). The discarding of variables in multivariate analysis. Biometrika, 54, 357-366. [5] Bentler, P.M. and Yuan, K-H. (1996). Test of linear trend in eigenvalues of a covariance matrix. The British Journal of Mathematical and Statistical Psy­ chology, 49, 299-312. [6] Boik, R.J. (1998). A local parameterization of orthogonal and semi-orthogonal matrices with applications. J. Mult. Anal, 67, 244-276. [7] Boik, R.J. (2002a). Spectral models for covariance matrices. Biometrika, 89, 159-182. [8] Boik, R.J. (2002b). Principal components models for correlation matrices, in press. [9] Box, G.E.P. (1949). A general distribution theory for a class of likelihood criteria. Biometrika, 36, 317-346. [10] Cattell, R.B. (1966). The scree test for the number of factors. Journal of Multi­ variate Behav. Res., I, 245-276. [11] Cliff, N. (1966). Orthogonal rotation to congruence. Psychometrika, 31, 33-42. [12] Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics. Chapman and Hall. [13] Dawkins, B. (1989). Multivariate analysis of national track records. The Ameri­ can Statistician, 43, 110-115. 141 [14] Diimbgen L. (1995). Likelihood ratio tests for principal components. Journal of Multivariate Analysis, 52, 245-258. [15] Flury B.N. (1986). An asymptotic test for redundancy of variables in the compar­ ison of two covariance matrices. Statistics and Probability Letters, 4, 123-126. [16] Flury B.N. (1988). Common Principal Components and Related Multivariate Models. New York: Wiley. [17] Fujikoshi, Y. (1985). Selection of variables in discriminant analysis and canonical correlation analysis. Multivariate Analysis-VI, P.K. Krishnaiah, ed., Elsevier, 219-236. [18] Fujikoshi, Y. (1989). Tests for redundancy of some variables in multivariate anal­ ysis. Statistical data analysis and inference, New York: Elsevier. [19] Fujikoshi, Y. and Rao, C.R. (1991). Selection of covariance in the growth curve model. Biometrika, 78, Issue 4, 779-785. [20] Gleason, T.C. (1976). On redundancy in canonical analysis. Psychological, 83, No.6, 1004-1006. [21] Huang, D. and Tseng, S. (1992). A decision procedure for determining the num­ ber of components in principal component analysis. Journal of Statistical Planning and Inference, 30,63-71. [22] Jackson, J.E. and R.H. Morris (1957). An application of multivariate quality control to photographic processing. Journal Amer. Statist. Assoc, 52,186- 199. [23] JolliTe, I.T. (1972). Discarding variables in a principal component analysis, I: Artificial data. Appl Statist., 21, 160-173. [24] JolliTe, I.T. (1973). Discarding variables in a principal component analysis, II: Real data. Appl Statist., 22, 22-31. [25] JolliTe, I.T. (1986). Principal Component Analysis. New York: Springer. [26] JolliTe, I.T. and Cadima J. (1995). Loadings and correlations in the interpreta­ tion of principal components. Journal of Applied Statistics, 22, No.2, 203-214. [27] Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187-200. 142 [28] Lawley, D.N. (1956). Tests of significance for the latent roots of covariance and correlation matrices. Biometrika, 43, 128-136. [29] Lazraq, A. (2001). Statistical inference concerning several redundancy indices. Journal of Multivariate Analysis, 79, 71-88. [30] Magnus, J.R. and Neudecker, H. (1979). The commutation matrix: some prop­ erties and applications. Ann. Statist., 7 , 381-394. [31] Magnus, J.R. and Neudecker, H. (1999). Matrix Differential Calculus with Ap­ plications in Statistics and Econometrics, Revised Ed. Chichester: Wiley. [32] Mathworks, Inc. (2000). Matlab Version 6.1. Mathworks, Inc. [33] McCabe, G.P. (1984). Principal variables. Technometrics, 26, 137-144. [34] Naik, D.N. and Khattree, R. (1996). Revisiting Olympic track records: Some practical considerations in the principal component analysis. The American Statistician, 50, 140-144. [35] Ouellette, D.V. (1981). Schur complements and statistics. Linear Algebra and its Applications, 36, 187-295. [36] Rencher, A.C. (1995). Methods of Multivariate Analysis. New York: Wiley. [37] Schott, J.R. (1991). Testing for the redundancy of variables in principal compo­ nents analysis. Statistics and Probability Letters, 11, 495-501. [38] Seber, G.A.F. (1984). Multivariate Observations. New York: Wiley. [39] Sen, P.K. and Singer, J.M. (1974). Large Sample Methods in Statistics: An in­ troduction with applications. Chapman and Hall. [40] Severini, T.A. (2000). Likelihood Methods in Statistics. Oxford University Press. [41] Sugiyama, T. and Tong, H. (1976). On a statistic useful in dimensionality reduc­ tion in multivariate linear stochastic system. Commun. Statist., AS, 711-721. [42] Taylor, A.E. and Mann, W.R. (1983). Advanced Calculus. New York: Wiley. [43] Tyler, D.E. (1981). Asymptotic inference for eigenvectors, The Annals of Statis­ tic, 9, 725-736. 143 [44] Van Den WolIenberg, A.L. (1977). Redundancy analysis an alternative for canon­ ical correlation analysis. Psychometrika, 42, No.2, 207-219. 144 APPENDICES 145 APPENDIX A Notation 146 Notation diag(o) det tr exp dim O(P) vec(A) dvec(o,p,p) © © %(T) Wp(m,2) H-) £(■) IIXII2 L IP e? I(p,p) D1-S D f Dp N p Ie Ie V'W diagonal matrix having diagonal elements as a vector a determinant trace exponential function dimension group of orthogonal p x p matrices stacking the columns of A into a single column vector converging a vector o to a p x p matrix direct or Kronecker product direct sum the operation is to be performed element-wise subspace spanned by the columns of a matrix T p-variate normal distribution with mean /Lt and covariance matrix S p x p matrix variate Wishart distribution with n degrees of freedom and covariance matrix S likelihood function loglikelihood function norm of X i.e YTi=I X? the matrix p2 x p of YT=I (ej ® ej) the p x p identity matrix the j th column of I p the p2 x p2 commutation matrix the ith derivative of vecG with respect to /V evaluated at p = O the ith derivative of A with respect to ip1 the duplication matrix the matrix ( lp2 + I(p,p)) /2 Fisher’s information the average Fisher’s information digamma function 147 APPENDIX B List of Matlab Subprograms 148 The following list describes the Matlab subprograms used in the simulation. 1. commute.m : Computes the permutation matrix, I(a,6)- 2. dup.m : Computes the duplication matrix, Dv. 3. kron.m : Computes Kronecker products of three or more input matrices. 4. makeJC.m : Constructs the elementary matrix, E i v^. 5. oplus.m : Computes the direct sum of two or more matrices. 6. vec.m : Stacks the columns of a matrix into a single column vector. 7. dvec.m : Converges a column vector into a matrix. 8. Pvechv2.m : Constructs the matrices of Al, A2, L, and {Ip2 — 9. solve_eta.m : Given parameter fi, solves for the implicit parameter r] under model (2.31). 10. solve_etal.m : Given parameter /i, solves for the implicit parameter 77 under model (2.33). 11. solve_eta2.m : Given parameter /n, solves for the implicit parameter r}2 under model (2.66). 12. solve_tau.m : Given parameter tp, solves for the implicit parameter r and com- ■ putes £ under model (2.11). 149 13. find-guess.m : Computes initial guesses for T0 and A0. 14. fisher_scoringl.m : Updates the parameters [i and tp under the parameterization of A in (2.4). 15. fisher_scoring2.m : Updates the parameters fj, and cp under the parameterization of A in (2.9). 16. find_Vstar.m : Computes V* = ( V 3 V4) ensured that the matrix of derivatives in (2.35) is nonsingular. This subprogram is used for estimating parameters. 17. find_Vstar_bart.m : Computes V* = (V3 V4) by using the row reduced echelon form to increase the number of zero entries in the matrices. This subprogram is used to compute a Bartlett correction to help reduce memory and run faster. 18. solve_mle.m : Computes the maximum likelihood estimates of T, A, and X. under model (2.31) and the parameterization of A in (2.4). 19. solve_mlel.m : Computes the maximum likelihood estimates of T, A, and S under model (2.33) and the parameterization of A in (2.4). 20. solve_mle2.m : Computes the maximum likelihood estimates of F, A, and X under model (2.31) and the parameterization of A in (2.9). 21. Bartlett.m : Computes E ^D(@; 6) j^ under model (2.31) and the parameteriza­ tion of A in (2.4). 150 22. Bartlettl.m : Computes E yD(0] 0)j under model (2.33) and the parameteri­ zation of A in (2.4). 23. Bartlett2.m : Computes E ^D(0; 0) j under model (2.31) and the parameteri­ zation of A in (2.9). 24. Exp_Da.m : Computes E (d (6; 0) j under an unrestricted H0. 25. schott_func.m : Computes Tk>q, T*, T**, and their p-values. 26. Cl function : Computes a local minimizer Cq such that the p-value for testing Ho: = c0 is equal to a by minimizing (p-value — a)2. 151 APPENDIX C Programming Codes 152 The following codes for Matlab version 5.3 and 6.1 (Mathworks Inc., 2000) can be used to compute the maximum likelihood estimates of F, A, and S , likelihood ratio test statistics, and Bartlett-corrected test statistics, including p-values. Matrices can be stored as sparse matrices to reduce memory. The programming codes numbered 1-9 from the preceding list can be found on BoiMs webpage : . solve_eta2.m function. function [G,eta2] = solve_eta2(A,Al,A2,V3,M,mu,Gain) %% H given a solution for mu, solve for the implicit parameter eta %% %% etal must be zero then solve for eta.2%% 7.7. [p,q]= size(A);Ip = speye(p);Dp = dup(p); 7.7. 7.7. find Mc which is Ip taken off M ’s column 7.7. 7.7. b = M ’* [I :p] ’ ; Mc = Ip ; Mc(:,b) = []; 7.7. 7.7. initial guess of eta2 7.7. 7.7. eta.2 = vech(Ip); C = krqn(Me’,A’*Gam) ; VG = (Al*V3*mu)+(A2*eta2); G = dvec(VG,p,p); 7.7. 7.7. find eta2 using Newton’s method 7.7. 7.7. while norm(G*G’-Ip,’fro’)+ norm(C*vec(G),’fro’) > le-8 F = Dp’*vec(G*G’-Ip); J = 2*Dp’*kron(G,Ip)*A2; Jinv = inv(J); check = Jinv*F; eta2 = eta2-check; VG = (Al*V3*mu)+(A2*eta2); G = dvec(VG,p,p); end 7.7. end while loop 7.7. 153 solve_tau.m function. function [Xi,t au] = solve_tau(phi,X i ,T 4 ,V l ,V 2 ,C2) %% 0Zo0Zo given a solution for phi, solve for the implicit parameter tau Th Th initial guess of tau %% 0Zo0Z. tau = inv(Vl’*V1)*V11*X i ; 0Zo0Z. 0Zo0Zo find tau using Newton’s method Th Th while norm(C2’*exp(T4*Xi),’fr o ’) > le-8 F = C 2 ’*exp(T4*Xi); J = C 2 ’*diag(exp(T4*Xi))*T4*Vl; Jinv = inv(J); check = Jinv*F; tau = tau-check; Xi = Vl*tau+V2*phi;. end 0Zo0Zo end while loop %% find-guess.m function. function [GamO,IamO] = find_guess(A,U,S,p,q,m,order) 0Zo0Z. 0Zo0Zo find initial guesses for Gamma and Lamda Th Th using Orthogonal Rotation to Congruence Th 0Zo0Zo write S as its svd %% Th [Ghat,L h a t ,Gh]= svd(S); 0Zo0Z. 0Zo0Zo obtain initial guess for lamda Th %% IamO = diag(Lhat); Th Th order eigenvalues from small to large and Th Th also the corresponding eigenvectors Th 'Th if order == 2 IamO = IamO([p:-!:!]); Ghat = Ghat(:,[p:- I :1]); LamO = diag(IamO); end; Th Th partition the eigenvector matrix Th Th Ghatl = Ghat(:,I :m ) ;Ghat2 = Ghat(:,m+1:p); Th 154 %% begin with any orthogonal matrix %% %% QOl = orth(randn(m,m)); check = Q O l ; Ql=Ones(m,m); %% %% do loop until Ql converges Th Th while (norm(Q1-check)) > le-4 check = QO l ; B2 = Ghatl*Q01’; B2 = B 2 ( :,q+1:m); [U3,D3,V3] = svd(B2,*U ) ; h = sum(diag(D3)>le-20); F = V3( :, I :h) *U35.; Astar = [ A U* F ] ; [U4,D4,V4] = svd(Ghatl’*Astar); h = sum(diag(D4)>le-20); Ql = V4(:,l:h)*U4'; Th Th check if tr close to m Th % trl = trace(Ql’*Astar’*Ghatl); Th QOl = Q l ; end; % end while °h Ac = null(full(Astar)’); [U5,D5,V5] = svd(Ghat2’*Ac); h = sum(diag(D5)>le-20); Q2 = V 5 ( :,I :h)*U5’; Th Th check if tr close to p-m Th % tr2 = trace(Q2’*Ac3 *Ghat2); Th Q = oplus(Ql,Q2); GamO = [Astar Ac]*Q; %% end function Th fisher_scoringl.m function. function [mu,newphi,check_conv] = fisher_scoringl(n,phi,G a m 3S,T l ,T2,D g l , L,check_convO) Th Th one iteration of Fisher_scoring algorithm %% Th [p,p]=Size(Gam);w= size(Dgl,2); Ip2 = speye(p's2) ;Ipp = commute(p,p) ; Np = (Ip2+Ipp)/2; check_eig = I; 155 %% TL set mu to zero TL %% ZetaO = [zeros(u,I) ; ph i ] ; %% H construct matrix of Fl %% %% lam = Tl*exp(T2*phi); Lamda = dvec(L*lam,p,p); Sigma = Gam*Lamda*Gam’; invsigma = inv(Sigma); Fll = 2*Np*kron(Gam*Lamda,Gam)*Dgl; Dll = Tl*diag(exp(T2*phi))*T2; F12 = kron(Gam,Gam)*L*D11; Fl = [ Fll F12] ; 0TL TL construct information matrix TL %% pre = (n/2)*F1’*kron(invsigma,invsigma); ihat = pre*Fl; invihat = inv(ihat); %% °TL construct score function %% TL Szeta = pre*vec(S-Sigma); delta = invihat*Szeta; Zeta = ZetaO + delta; TL TL check if it is on the right direction but too far %% TL the magnitude is reduced TL TL while norm(delta(l:w,:)) > 0 . 2 5 delta. = delta/2; Zeta = ZetaO + delta; end; % end while % check_conv = Szeta5*delta; while check_conv > check_convO delta = delta/2; Zeta = ZetaO + delta; check_conv = Szeta5*delta; end; % end while % TL TL obtain mu and newphi TL TL mu = Zeta(l:w,:); newphi = Zeta(w+1:end,:); %% end function %% fisher_scoring2.m function. 156 function [mu,newphi,check_conv] = f isher_scoring2(n,p h i ,G a m ,S ,D g l , L ,check_convO,X i ,V 2 ,T 3 ,T4) %% U one iteration of Fisher scoring algorithm %% U [p,p]=size(Gam);w= size(Dgl,2); Ip = speye(p); Ip2 = speye(p'"2) ; Ipp = commute(p,p); Np = (Ip2+Ipp)/2; check_eig = I; %% TL set mu to zero TL %% ZetaO = [zeros(w,I ) ; p h i ] ; onep = ones(p,I); TL TL construct Fl TL %% wi = inv(onep’*T3*exp(T4*Xi)); lam = wi*phi(l)*T3*exp(T4*Xi); Lamda = dvec(L*lam,p,p); Sigma = Gam*Lamda*GamJ; invsigma = inv(Sigma); Fll = 2*Np*kron(Gam*Lamda,Gam)*Dgl; Hp = sparse(Ip - (lam/phi(I))♦onep’) ; Wl = sparse(diag(exp(T4*Xi))*T4); TL TL Dll with respect to phiO and phi %% %% Dll = [wi*T3*exp(T4*Xi) wi*phi(I)*Hp*T3*Wl*V2]; F12 = kron(Gam,Gam)*L*Dll; Fl = [ Fll F1 2 ] ; TL TL construct information matrix TL TL pre = (n/2)*F1’*kron(invsigma,invsigma); ihat = pre*Fl; invihat = inv(ihat); TL TL score function TL %% Szeta = pre*vec(S-Sigma); delta = invihat*Szeta; 157 Zeta = ZetaO + delta; Th Th check if is on the right direction but too far Th Th the magnitude is reduced Th %% while norm(delta(l:w,:)) > 0 . 2 5 delta = delta/2; Zeta = ZetaO + delta; end; % end while % check_conv = Szetaj*delta; while check_conv >= check_convO delta = delta/2; Zeta = ZetaO + delta; check_conv = Szetaj*delta; end; % end while % Th Th obtain mu and newphi Th %% mu = Zeta(l:w,:); newphi = Zeta(w+1:end,:); Th end function Th find-Vstar.m function. function [V3,V4] = find_Vstar(A,Al,M,Gam) Th Th find V = (V3 V4) ensured that C*(Ip2-Ipp)*Al is invertable Th Th [p,q]= size(A); Ip = speye(p); Ip2 = speye(p"2); Ipp = commute(p,p); U Th M is the first m columns of Ip Th Th find Mc which is Ip taken off M js column Th Th b = M J*[l:p] J ; Mc = Ip ; M c (: ,b) = []; Th Th find V3 and V4 using the SVD Th Th V 3 JV3= I, V 4 JV4 = I and V 3 JV4 = 0 Th %% C = kr o n (MeJ,A J *Gam) ; Cin = C*(Ip2-Ipp)*Al; [U,D,V]=svd(full(Cin)); h = sum(diag(D)>le-20); Th Th obtain V4 and V3 %% 158 %% V4 = V(:,l:h); V3 = n u l l(V40; V3 = sparse(V3); V4 = sparse(V4); %% end function %% find-Vstar-bart.m function. function [V3,V4] = f ind_Vstar_bart(A ,A l ,M ,Gam) %% %% find V = (V3 V4) %% H using rref to increase zero entries %% %% Cp, q] = size (A); Ip = speye(p); Ip2 = speye(p~2); Ipp = commute(p,p); %% °/0°/0 Mc is the last p-m columns of Ip %% %% b = M 5*[l:p]’; Mc = Ip ; Mc (: ,b) = [] ; C = kron(Mc',A'*Gam); Cin = C * (Ip2-Ipp)*A1*A1’; AV4 = r r e f (Cin)5; [U,D,V]=svd(full(Cin)); h = sum(diag(D)>le-20); AV4 = A V 4 ( :,I :h); 0Z0Z, 0Z0Z obtain V4 b/c A1'A1V4=V4 %% %% V4 = Al ’ =t=AV4; 0Z0Z 0Z0Z obtain V3 which is orthogonal to V4 %% %% V3 = nu l l (full(V4)’); AV3 = rr e f (V3’* A 1 ; V3 = A1’ *AV3; V3 = sparse(V3); V4 = sparse(V4); 0Z0Zo end function 0Z0Z Solvejtnlel.m function. function [Gamma,Lamda,p h i ,L a ,loglk,conv] = solve_mleI(n,A ,A l ,A 2 ,L ,M , Gam,lam,S,Tl,T2,Dgs) 0Z0Z 159 %% compute maximum likelihood estimates of Gamma and Lamda %% %% [p,q]= size(A); Ip = speye(p); %% 7o7o initial guess of zeta %% %% phi = inv(T2’*T2)*T21*log(inv(Tl3 *T1)*T1’*lam); G = ones(p,p); loglk = [];i = 0; Gamma = Gam; check_conv = I; check_convO = le8; conv = I; 7.7. 7.7. check_conv - Szeta’*invihat*Szeta 7.7. 7.7. La = -((n*p)/2)-(n/2)*log(det(S)); SO = S; while sqrt(check_conv)> le-6 SO = Gam’*S0*Gam; A = rref(A’*Gam) ’; Gam = Ip; [V3,V4] = find_Vstar(A,Al,M,Gam); Dgl = Dgs*V3; [mu,newphi,check_conv] = fisher_scoringl(n,phi,Gam,SO,Tl,T2,Dgl, L,check_convO); check_convO = check_conv; 7.7. 7.7. check convergence 7.7. 7.7. if max(abs(T2*newphi)) > 14 conv = 0; return; end 7. end if % [G,eta2] = solve_eta2(A,Al,A2,V3,M,mu,Gam); phi = newphi; Gam = G; Gamma = Gamma*G; 7.7. 7.7. computer log likelihood function without constant7.7. 7.7. Iamda = Tl*exp(T2*phi); Lamda = dvec(L*lamda,p,p); Sigma = Gam*Lamda*Gam’; invsigma = inv(Sigma); loglk = [loglk ; [ -(n*trace(SO*invsigma))/2-(n*log(det(Sigma)))/2 check_conv]]; end; 7.7. end of while loop 7.7. %% end fu n c t io n Th solve_mle2.m function. 160 function [Gam,Lamda,p h i O ,Xi3L a 3loglk,conv] = solve_mle2(n,A l ,A 2 ,L ,Gam, lam,S,Dgs,T3,T4,C2) Th Th compute maximum likelihood estimate of Gamma,Lamda,phiO Th 'Th [p,p]= size(Gam); Ip = speye(p); [p,q3]= size(T3); oneq3 = ones(q3,l);onep = ones(p,I); %% Th initial guess of Xi 'Th Th X = [T4 oneqS ]; Xi = inv(X’*X)*X’*log(inv(T3’*T3)*T3’*lam) ; Xi (end) = [] ; G = ones(p,p); loglk = [];i = 0; check_conv = I; check_convO = IeS; conv = I ; La = -((n*p)/2)-(n/2)*log(det(S)); trs = trace (S); W = diag(exp(T4*Xi))*T 4 ; [U,D,V] = svd(C2'*W); h = sum(diag(D)>le-20); Vl = V ( :,I :h); V2 = V (:,h+1:end); phiI = inv(V2’*V2)*V2’*Xi; [Xi,tau] = solve_tau(phil,Xi,T4,Vl,V2,C2); phi = [trs;phi1]; Th Th check_conv = Szeta’*invihat*Szeta 'Th Th while sqrt(check_conv)> le-6 W = diag(exp(T4*Xi))*T4; [U,D,V] = svd(C2’*W ) ; h = sum(diag(D)>le-20); Vl = V ( :,I :h); V2 = V(:,h+1:end); p h i (2:end) = inv(V2’*V2)*V2’*Xi; [Xi,tau] = solve_tau(phi(2:end),Xi,T4,Vl,V2,C2); [mu,newphi,check.conv] = fisher_ scoring2(n,p h i ,Gam,S ,D g s ,L , check.convO,X i ,V 2 ,T 3 ,T 4 ) ; check.convO = check.conv; [Xi,tau] = solve_tau(newphi(2:end),Xi,T4,Vl,V2,C2); 'Th 'Th check convergence Th 161 if max(abs(T4*Xi)) > 14 conv = 0; return; end % end if % eta = solve_eta(Al,A2,mu); VG = Al*mu+A2*eta; G = dvec(VG,p,p); Gam = Gam*G; p h i (I) = newphi(l); %% %% computer log likelihood function without constant0/,0/, U lamda = p h i (I)*T3*exp(T4*Xi)/ (onep1*T3*exp(T4*Xi)); Lamda = dvec(L*lamda,p,p); Sigma = Gam*Lamda*Ga m 5; invsigma = inv(Sigma); loglk = [loglk ; [ - (n*trace(S*invsigma))/2-(n*log(det(Sigma)))/2 check_conv]]; end; °/,0/, end of while loop 7,7, phiO = p h i (I); 7,7, end function 7,7, Bartlettl.m function. function elrt = Bartlettl(Gam,Lam,phi,Tl,T2,A2,L,V3,n,m,nu,Dgs) p = Iength(Gam); ci = size(Tl,2); k = length(m); df=round(sum(nu)); Idf=speye(df); Ip = speye(p); Ic = speye(ci); Idf 2=speye (df'‘2) ; permdf=Commute(df,df); permp = commute(p,p); permp2 = commute (p'‘2 ,p~2) ; permpI = commute(nu(l),nu(l)); Ipp = speye(p~2); Ndf=Idf2+permdf; tNp = (Ipp+permp); Impp = Ipp-permp; Ik = speye(k); Dp = dup(p); Lam = sparse(Lam); Sigma = Gam*Lam*Gam’; 162 Sig = Sigma; Sigi = inv(Sigma); for il=l:2 I{il>=speye(nu(il)); Ipf{il>=speye(nu(il)*p); for 12=11:2 if isempty(nu(il)*nu(i2))==0 I_{il,i2}=commute(nu(il),nu(i2)); I_{i2,il}=I_{il,i2}'; else I_{il,i2}=[]; I_{i2,il}=[] ; end end end 111 = speye(nu(l)~2); NI = (Ill+I_{l,l})/2; nparam=round(sum(nu)); elrt=nparam; 7.7. 7,7» Create Dg(I) , Dg(2) ,Dg(S) 7.7. 7.7. Dl_g_m = sparse(Dgs*V3); K4 = sparse(kron3(Ip,vec(Ip)',Ip)); kdgl = sparse(kron(Dl_g_m,Dl_g_m)); K = K4*kdgl; D2_g_mm = sparse(A2*Dp’* K ) ; Kl = sparse(kron(I{l},permpl)); K2 = sparse(kron(D2_g_mm,Dl_g_m)); KS = sparse(K2+permp2*K2*Kl); K5 = sparse(kron(Dl_g_m,permp*D2_g_mm)); K6 = sparse(kron(Ipp,permp)); clear K Kl K2 DgO Dg Wl D3_g_mmm = -A2*Dp’*K4*(K6*K3+K5); clear KS K4 K5 K6 7.7. 7.7. Create Dl(I) ,Dl(2) ,Dl(S) 7.7. 7.7. Dl_l_p = sparse(Tl*diag(exp(T2*phi))*T2); W2 = sparse(I,I ) ; WS = sparse(l,l); for i = I : ci ei = I c (:,i ) ; W2 = W2 + kron(ei*ei',ei*ei’) ; WS = WS + kronS(ei*ei’,ei*ei',e i ’); end; 163 ketl = sparse((kron(exp(T2*phi) ,T V ) ) ’) ; D2_l_pp = sparse(ketl*W2*kron(T2,T2)); D3_l_ppp = sparse(ketl*W3*kron3(T2,T 2 ,T2)); clear W2 W3 Sig=sparse(Sig); Sigi=sparse(Sigi); V2i=kron(Sigi,Sigi ) ; index=[I:2]5; jj=find(nu==0); index(j j) = [] ; U %% Use Dl_{i> for first derivative of vec Sigma with respect to theta{i}'%% %% Theta = (mu5 p h i 5) 5%% Vh Use D2_{i,j> for second derivative of vec Sigma with respect to Vh Vh theta-Ci}5 x theta{j > 5 Vh °h°h Use D3_{i,j,k} for third derivative of vec Sigma with respect to%% %% thetafi}5 x thetalj}5 x theta-Cj> 5Vh Vh kgg = sparse(kron(Gam,Gam) ); kgl = sparse(kron(Gam*Lam,Gam)); kglg = sparse(kron3(Gam,vec(Lam)5,Gam)); kgig = sparse(kron3(Gam,vec(Ip)5,Gam)); kid = sparse(kron(L*Dl_l^p,Dl_g_m)); D1_{1> = sparse(tMp*kron(Gam*Lam,Gam)*Dl_g_m); Dl_{2> = sparse(kgg*L*Dl_l_p); D2_{1,1} = tNp*((kglg*(-kdgl))+kgl*D2_g_mm); D2_{2,2} = kgg*L*D2_l_pp; D2_{2,1> = tNp*kgig*kld; D2_{1,2> = D2_{2,1>*I_{1,2}; kdld2 = kron(-Dl_g_m,D2_g_mm); kpd2dl = kron(permp*D2_g_mm,Dl_g_m); DD_4 = kdld2+2*kpd2dl*kron(I{l},Nl); D3_{1,1,1> = tMp*(kgl*D3_g_mmm+kglg*DD_4); clear DD_4 D3_{2,2,2> = kgg*L*D3_l_ppp; D3_{2,2,1> = tMp*kgig*kron(L*D2_l_pp,Dl_g_m); D3_{2,1,2} = D3_{2,2,l>*kron(I{2>,I_{1,2 » ; D3_{1,2,2> = D3_{2,l,2}*kron(I_{l,2},I{2}); DD_3 = kron 3 (I p ,permp,Ip)*kdgl; D3_{2,1,1> = tNp*((kgig*kron(L*Dl_l_p,D2_g_mm))+ (kron3(vec(L*Dl_l_p)5,Ga m ,Gam)*kron(I{2>,DD_3))); clear DD_3 D3_{1,2,1}=D3_{2,I ,l}*kron(I_{l,2 > ,I { 1 » ; D3_{1,1 ,2}=D3_{1,2, l>*kron(I{l>, I_{1,2»; 164 clear kgg kgl kglg kgig kid kdld2 kpd2dl U %% Computation of Fl %% %% Fl=sparse(I,I); F2=sparse(l,l); F3=sparse(l,l); for 1 = 1 :length(index) Il=Index(I); Fl=Fl+Dl_{ll>*make_E(ll,nu); end; Itheta=sparse(FI5 *V2i*Fl/2); Ithetai=Inv(Itheta); Fisher_InfO=Itheta; clear D1_{1> Dl_{2> %% %% Computation of F2, F l l , & F3 %% %% nindex=length(index); for il=lrnindex ll=index(il); El=make_E(ll,nu); for i2=l:nindex 12=index(i2); E2=make_E(12,nu); if isempty(D2_{11,12>) ==0 F2=F2+D2_{H,12}*kron(El,E2) ; end; end end Fll=reshape(F2,p's2*nparam,nparam); clear D2_{1,1> D2_{2,2} D2_{1,2> D2_{2,1> for il=l:nindex ll=index(il); El=make_E(ll,nu); for i2=l:nindex 12=index(i2); E2=make_E(12,nu); E12=kron(El,E2); for i3=l:nindex 13=index(i3); if isempty(D3_{11,12,13}) == O E3=make_E(13,nu); E123=kron(E12,E3); F3=F3+D3_{11,12,13>*E123; 165 end end end end clear D3_{2,2,2> D3_{2,2,1} D3_{2,1,1> D3_{1,2,2> D3_{2,1,2} D3_{1,2,1> D3_{1,1,2} zeta=zeros(10,I ) ; Ithetai2=kron(Ithetai,Ithetai); ithetai=vec(Ithetai); k3_tmp4=kron3(Ip,vec(Sigma)’,Ip); V3=kron3(Sigi,vec(Sigi),Sigi); F2dd=V2i*F2; Fldd=V2i*Fl; Flddd=V3*Fl; FlFl=kron(Fl,Fl); FlIthetai=Fl*Ithetai; tmp=Fldd’*F2; K3s=2*Flddd ’ *F IF I-re shape (tmp, df '"2, df) ’ -tmp/2; tmp=V3'*FlFl; Ul=reshape (F2dd,p''2*df ,df ) ... 2*reshape(reshape(tmp,df*p~2,df) ’,df,df*p~2); tmpl= kron(Ip,kron(vec(Sigi)’,Ip)*kron(Ip,Fl)); U2= -6*reshape(Flddd',p*2*df,p"2)*F2 + ... +2*tmpl,*(tMp+permp)*tmp + ... reshape (reshape (V2i*F3 ,p''2*df'2, df ) ’ ,pT2*df,df~2) ; Ll=Ul*vec(FlIthetai)/2; L2=vec(Ul*kron(Idf,Fllthetai))/2; tmp=krbn3(Sigi,v e c (Sigi) ’ ,Ip)*FlFl; K4s=12*FlFl’*V3*F2 ... -9*tmp’*permp*tmp ... -2*reshape(Fldd,*F3,df'‘2,df"2) . . . -(3/2)*reshape(F2dd’*F2,df'"2,df ~2) ; clear V3 F2dd FlFl tmp tmpl zeta(4)=Ll5 *Ithetai*Ll+L2’*kron(permdf,Ithetai)*L2; clear LI L2 tmp=K3 s*ithetai; zeta(5)=vec(K3s*Mf *Ithetai2*K3s’) ’*ithetai/4+tmpJ*lthetai*tmp/4; clear tmp zeta(10)=vec(Ndf*Ithetai2)’*vec(K4s)/12+ithetai5 *K4s*ithetai/12; zeta(l) = (l/2)*ithetai'*vec(Ul*kron(Ithetai*Fldd',tNp)*k3_tmp4’*F1) ; zeta(2) = (l/3)*ithetai’*vec(K3 s *kron(FIIthet ai,Fllthetai)’*Flddd); zeta(3)=(1/4)*ithetai’*vec(U1*kron(Itheta i ,tNp*kron(Sigma,Sigma))*U1’); zeta(6)=(l/2)*ithetai’*vec(K3s*Ndf*kron(Ithetai,Fllthetai’)*U1’); Tmp=KSs’*Ithetai*Ul; 166 ze t a (7)=Ithetai’*Tmp*vec(FlIthetai)/2; zeta(8)=(1/6)*vec(kron(FlIthetai,Ithetai)*Ndf) ’*vec(U2); ze t a (9)=ithetai’*U2'*vec(Fllthetai’)/6; zeta=zeta/n; elrt=df+sum(zeta); %% end function %% schott-func.m function. function [Tkq,Tstarl,Tstar2,P_Tkq,P_Tstarl,P_Tstar2] = schott_func(A,S, n,p,q,k) Ip = speye(p); df = k*q; Th Th find Tkq using SVD od S Th Th [Ghat,Lhat,Tmp] = svd(S); I = diag(Lhat); Y2 = Ghat(:,k+l:p); T = O; for j = l:k d-Cj} = [] ; for i = k+l:p d{j> = [d{j> ; l(j)*l(i)/(l(i)-l(j))'12 ] ; end; D-Cj} = diag(d{j»; Tj = trace(A’*Ghat(:,j)*Ghat(:,j ) ’*A*inv(A,*Y2*D{j}*Y2,*A)); T = T+Tj; end; % end for % Tkq = n*T; R = Y2*Y2’*A; [u,dl,v] = svd(R); r = sum(diag(dl)>le-20); if r == q PJTkq = I - chi2cdf(Tkq,df); end; if r < q P_Tkq = 0; end; Th Th find Bartlett adjustment factor Th Th Ghat22 = Ghat(p-q+l:p,k+l:p); cl = 0;c2 = 0;c3 = 0; for j = l:k Sj = Ghat22*D{j}*Ghat22’; 167 Q-Cj } = Ghat22 ’ *inv(S j ) *Ghat22; for u = k+1 :p • cll = q * l (u)/ ( K u ) - I (j)) ; cl2 = 2*Q{j>(u-k,u-k)''2*(l(u)''2*l(j)'-3*(2*l(j)+3*l(u))/(l(u)-l(j))-6) ; cl3 =Q{j> (u-k,u-k)* (I(u)*1(j )*(2*1(u)"3-7*1(u)"2*1(j )+4*1(u)*1(j )"2+ l(j)"3)/(l(u)-l(j))"5); cl = cl+(cll+cl2+cl3); for v = k+1:p if v < u c21 = (Q{j > (u-k,u-k)+Q{j > (v-k,v-k))*1(u)* 1 (v)*1(j)"2/((I(u)-I(j ))"2 *(l(v)-l(j))"2); deno = ((l(u)-l(j))"4*(l(v)-l(j))"4); c22 = 2*Q{j > (u-k,v-k)"2*1(u)* 1(v)*1(j)"3*(I(u)* 1(v)*1(j)-3*1(j)"3+ (Ku) +1 (v) ) * (I (v) * (I (j ) -I (u) ) +1 (u) * (I ( j ) -I (v) ) +1 ( j ) "2) ) /deno; c23 = 2*Q{ j > (u-k,u-k) *G){ j > (v-k, v-k) *1 (u) *1 (v) *1 (j ) "3* (I (j ) "3-3*1 (u) *1 (v) *1 ( j) +1 (u) *1 (v) * (Ku) +1 (v) ))/ deno; c2 = c2+ (c21+c22-c23); end; end; end; end; for i = l:k for j = l:k if i ~= j for u = k+1: p c31 = Cj{j>(u-k,u-k)*l(u)*l(j)*l(i)*(l(u)"2-l(j)*l(i))/((l(u) -l(j))"2*(l(u)-l(i))"2*(l(j)-l(i))); c3 = c3 + c31; end; end; end; end; Ckq = l/2*(k-l)+inv(k*q)*(cl - c2 - c 3 ) ; Tstarl = (l-Ckq/n)*Tkq; Tstar2 = Tkq/(l+Ckq/n); P_Tstarl = l-chi2cdf(Tstarl,df); P_Tstar2 = l-chi2cdf(Tstar2,df); Th end function Th Cl function. function f = CI(c01,n,p,Al,A2,L,Gam_ini,lhat,S,Dgs,T3,T4,Cl,loglk) p = pvalue(c01,n,p,Al,A2,L,Gam_ini,lhat,S,Dgs,T3,T4,Cl,loglk); f = (p-0.05)"2; 168 pvalue function. function p = pvalue(cOl,n,p,Al,A2,L,Gam_ini,Ihat,S,Dgs,T3,T4,C l 1Ioglkl) C2 = T 3 ,*(Cl-ones(p,l)*c01); [GamO,L a m O ,phiOO,Xi5La,loglk,convO]=solve_mle2(n,A l ,A 2 ,L ,Gam_ini,Ihat,S , Dgs,T3,T4,C2); LRT = 2*(loglkl-loglk(end,l)); p = l - chi2cdf(LRT5I) The matrix C* from Example 2. C =[] ;L=[] ;1=[] ; Ip = speye(p); linear = [1;2;3;4;5;6;7]; for i= l :6 H i } = linear."!; H = [ones(p,I) C ] ; P = ppo(H); Ci = (Ip-P)*l{i}; C = [C Ci] ; end; for i= l :6 C(:,i)= C ( :,i)/sqrt(C(:,i),*C(:,i)); end; MONTANA STATE UNIVERSITY - BOZEMAN 3 762 10359845 2