Dimension reduction in PCA : likelihood-based methods
by Kamolchanok Choochaow
A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of
Philosophy in Statistics
Montana State University
© Copyright by Kamolchanok Choochaow (2002)
Abstract:
The main objective of this thesis is to develop procedures for making inferences about the eigenvalues
and eigenvectors of a covariance matrix. Specifically, new procedures for examining dimension
reduction in principal component analysis (PCA) are developed. The dimension reduction consists of
the following two aspects: reduction in the number of components and reduction in the number of
original variables. The procedures are based on a likelihood approach. Parameterizations of eigenvalues
and eigenvectors are presented. The parameterizations allow arbitrary eigenvalue multiplicities. The
use of the Fisher scoring algorithm for computing maximum likelihood estimates of the covariance
parameters subject to multiplicity and other constraints is discussed. Asymptotic distributions of
estimators of covariance parameters are derived under normality and non-normality. Likelihood ratio
tests and Bartlett corrections are described. Simulation studies show the effectiveness of Bartlett
corrections. The new procedures are demonstrated to give better overall results than some existing
methods. 
DIMENSION REDUCTION IN PGA: 
LIKELIHOOD-BASED METHODS
by
Kamolchanok Choochaow
A dissertation submitted in partial fulfillment 
of the requirements for the degree
of
Doctor of Philosophy 
in
Statistics
MONTANA STATE UNIVERSITY 
Bozeman, Montana
July 2002
ii
5 ^ ?
< L ^
APPROVAL
of a dissertation submitted by
Kamolchanok Choochaow
This dissertation has been read by each member of the dissertation committee and 
has been found to be satisfactory regarding content, English usage, format, citations, 
bibliographic style, and consistency, and is ready for submission to the College of 
Graduate Studies.
Robert J. Boik
(Signature)/ Date
Approved for the Department of Mathematical Sciences
Kenneth L. Bowers }\Zyn/Vv£>Lii %  f b )
(Signature)
l h u S / b ' 1
Date
Approved for the College of Graduate Studies
Bruce McLeod. / - . 2
(Signature) Date
iii
STATEMENT OF PERMISSION TO USE
In presenting this dissertation in partial fulfillment of the requirements for a 
doctoral degree at Montana State University, I agree that the Library shall make it 
available to borrowers under rules of the Library. I further agree that copying of 
this dissertation is allowable only for scholarly purposes, consistent with “fair use” as 
prescribed in the U. S. Copyright Law. Requests for extensive copying or reproduction 
of this dissertation should be referred to Bell & Howell Information and Learning, 
300 North Zeeb Road, Ann Arbor, Michigan 48106, to whom I have granted “the 
exclusive right to reproduce and distribute my dissertation in and from microform 
along with the non-exclusive right to reproduce and distribute my abstract in any 
format in whole or in part.”
Signature
Date <12
ACKNOWLEDGEMENTS
I would like to thank my adviser, Professor Robert J. Boik, for his invaluable 
assistance and guidance during the preparation of this thesis. I truely appreciate his 
encouragement and patience.
I am also grateful for John Borkowski, Jim Robinson-Cox, Steve Cherry, and Bill 
Quimby’s advice throughout my graduate studies.
Lastly, I would like to thank my dad, mom, and husband for their love and
support.
VTABLE OF CONTENTS
LIST OF TABLES ..............................................    vii
LIST OF FIGURES..................................................................................................  x
1. INTRODUCTION .............................................................................................  I
Principal Component Analysis.........................................................................  2
Reduction in the Number of Components.......................................................  4
Reduction in the Number of Original Variables.............................................. 12
Example I ....................................    30
Example 2 ...........................................................................................................  32
2. PARAMETERIZATION.......................   35
Spectral M odel...................................................................................................  35
Parameterization of Eigenvalues ......................................................................  36
Reduction in the Number of Variables.....................................................  37
Reduction in the Number of Components................................................. 41
Parameterization of Eigenvectors ..................................................................... 52
Solving for Implicit Parameters........................................................................  70
Solving for r  ................................................................................................  70
Solving for r/ ................................................................................................  71
3. ESTIMATING PARAMETERS AND CONSTRUCTING ASYMP­
TOTIC DISTRIBUTIONS ........................................................    72
Loglikelihood Function.......................................................................................  73
Fisher Scoring Algorithm................................................................................... 75
Solving the Likelihood Equation......................................................................  76
Finding the Initial Guesses..............................................................................  77
Asymptotic Distributions of Estimators.....................................'....................  81
Normal Population.......................................................................................  81
Non-normal Population...............................................................................  85
4. TESTING HYPOTHESIS AND BARTLETT CORRECTION ..................  87
Hypotheses.................................   87
Likelihood Ratio T est................................................................   89
Bartlett Correction..................... ■.......................................................................  92
Third and Fourth Order Moments.............................................................  92
Expectation of LRT Statistic ..................................................................... 98
Under a Restricted Ha .................................................................................  100
Under an Unrestricted Ha ..........................................    100
Matrix Expressions for Z i and K i ...................................................................  101
5. SIMULATION.....................................................................................................  108
Simulation Study of Variable Reduction (Redundancy) Tests ...............  108
The Comparison Test................................................................................... 109
Variance Reduction Procedure...................................................................  I l l
Discussion.....................................................................................................  132
Simulation Study of Component Reduction T ests ........................................... 133
Numerical Examples..........................................................................................  135
Example I. (continued) ............................................................................... 135
Example 2. (continued) ............................................................................... 136
6. CONCLUSION.........................................................................    138
REFERENCES.................................................................................................   140
APPENDICES...........................................................................................................  144
APPENDIX A -  Notation.................................................................................  145
APPENDIX B -  List of Matlab Subprograms...............................................  147
APPENDIX C -  Programming Codes............................. ; ..............................  151
vi
vii
LIST OF TABLES
Table Page
1. Eigenvalues of Covariance Matrix: Photographic Example.................... 31
2. Eigenvectors of Covariance Matrix: Photographic Example..................  31
3. Eigenvalues of Covariance Matrix: Women’s Track Example................. 34
4. Eigenvectors of Covariance Matrix: Women’s Track Example...............  34
5. Probabilities of Type I error when (p, k, q) =  (10,2,4) and A =
04,10,25)' .............................................................................................  116
6. Probabilities of Type I error when (p, k, q) =  (10,2,4) and A =  (lg, 5,10)' 116
7. Probabilities of Type I error when (p, k, q) =  (10,2,4) and A =  (1'8, 2,10)' 117
8. Probabilities of Type I error when (p,k,q) =  (10,2,4) and A =
(4 .1 .5 .10) ' ..........................................................................................  117
9. Probabilities of Type I error when (p, k, q) — (10,2,1) and A =
(4 .10 .25) ' ..........................................................................................  118
10. Probabilities of Type I error when (p, k, q) = (10,2,1) and A =  (1'8, 5,10)' 118
11. Probabilities of Type I error when (p, k, q) =  (10,2,1) and A =  (18, 2,10)' 119
12. Probabilities of Type I error when (p, k, q) =  (10,2,1) and A =
(4 .1 .5 .10) ' ..........................................................................................  119
13. Probabilities of Type I error when (p,k,q) — (10,4,2) and A =
(4.10.10.10.25) ' ................................................................................ 120
14. Probabilities of Type I error when (p,k,q) = (10,4,2) and A =
(4 .5 .5 .10 .10) ' ....................................................................................  120
15. Probabilities of Type I error when (p,k,q) = (10,4,2) and A =
(4 .2 .5 .10 .10) ' 121
16. Probabilities of Type I error when (p,k,q) =  (10,4,2) and A =
%  1.5,5,10,10)'....................................................................................  121
17. Probabilities of Type I error when (p,k,q) =  (10,4,5) and A =
(1^,10,10,10,25)'................................................................................... 122
18. Probabilities of Type I error when (p,k,q) — (10,4,5) and A =
%  5,5,10,10)'.........................................   122
19. Probabilities of Type I error when (p, k, q) = (10,4, 5) and A =
(1^,2,5,10,10)'.......................................................................................  123
20. Probabilities of Type I error when (p, k, q) =  (10,4,5) and A =
(1^,1.5,5,10,10)'....................................................................................  123
21. Probabilities of Type I error when (p,k,q) = (15,2,1) and A =
% , 15,35)'.............................................................................................  124
22. Probabilities of Type I error when (p,k,q) = (15,2,1) and A =
(1L;,10,15)'.............................................................................................  124
23. Probabilities of Type I error when (p, k, q) =  (15,2,1) and A =
2,15)'...............................................................................................  125
24. Probabilities of Type I error when (p, k, q) — (15,2,1) and A =
(T13, 1.5,15)'.......................................................   125
25. Probabilities of Type I error when (p,k,q) =  (15,4,2) and A =
(T11, 15,15,15,35)'.................................................................................  126
26. Probabilities of Type I error when (p,k,q) =  (15,4,2) and A =
(T11, 10,10,15,15)'.................................................................................  126
27. Probabilities of Type I error when (p,k,q) — (15,4,2) and A =
(T11, 2,10,15,15)' ..................................................................................  127
28. Probabilities of Type I error when (p, k, q) =  (15,4,2) and A =
(Tn , 1.5,10,15,15)' ...............................................................................  127
29. Power of test when A =  (1'8,10,25)', k = 2, q — 1 .................................. 128
30. Power of test when A =  (T8, 2,10)', k = 2, q = l  ...................................  129
viii
31. Power of test when A =  (lg, 10,10,10,25)', k = A, q = 2 ........................ 130
32. Power of test when A =  (lg, 2,5,10,10)', k =  '4, q = 2 ......................... 131
33. Percentage coverage of Cl when (p, k) = (8,2) and A =  (lg, 10,25)' .... 134
34. Percentage coverage of Cl when (p, k) = (8,2) and A =  (1'6, 2,10)'........  134
35. Percentage coverage of Cl when (p, k) =  (8,4) and A =  (I4 ,10,10,10,25)' 134
36. Percentage coverage of Cl when (p, k) =  (8,4) and A =  (I4 ,2,10,10,10)' 134
37. P-value of Testing Hypothesis: Women’s Track Example....................... 136
ix
XLIST OF FIGURES
Figure Page
1. Ideal Scree Graph......................................................................................... 6
2. A Profile Plot of the Expected Mean........................................................  33
3. Power of test when A =  (Iz8, 10,25)', k = 2, q — I ..................................  128
4. Power of test when A =  (18, 2,10)', Jc = 2, q = I .................................... 129
5. Power of test when A =  (1'6, 10,10,10,25)', k = 4, q = 2 ......................... 130
6. Power of test when A =  (1'6, 2,5,10,10)', k = 4, q = 2 ........................... 131
ABSTRACT
The main objective of this thesis is to develop procedures for making inferences 
about the eigenvalues and eigenvectors of a covariance matrix. Specifically, new pro­
cedures for examining dimension reduction in principal component analysis (PCA) 
are developed. The dimension reduction consists of the following two aspects: reduc­
tion in the number of components and reduction in the number of original variables. 
The procedures are based on a likelihood approach. Parameterizations of eigenval­
ues and eigenvectors are presented. The parameterizations allow arbitrary eigenvalue 
multiplicities. The use of the Fisher scoring algorithm for computing maximum like­
lihood estimates of the covariance parameters subject to multiplicity and other con­
straints is discussed. Asymptotic distributions of estimators of covariance parameters 
are derived under normality and non-normality. Likelihood ratio tests and Bartlett 
corrections are described. Simulation studies show the effectiveness of Bartlett cor­
rections. The new procedures are demonstrated to give better overall results than 
some existing methods.
ICHAPTER I 
INTRODUCTION
The main objective of this thesis is to develop procedures for making inferences 
about the eigenvalues and eigenvectors of a covariance matrix. Specifically, new pro­
cedures for examining dimension reduction in principal component analysis (PCA) 
are developed. The procedures are based on a likelihood approach.
The first chapter of this thesis briefly describes principal component analysis 
followed by a discussion of dimensionality reduction methods. Two examples are 
given. In Chapter 2, parameterizations of eigenvalues and eigenvectors are presented. 
Two eigenvalue parameterizations are adopted under different objectives of use. The 
parameterizations allow arbitrary eigenvalue multiplicities. The Newton-Raphson 
algorithm is used to solve implicit parameter equations. The hypothesis test of re­
dundancy is described and an eigenvector parameterization under redundancy is pro­
posed.
Chapter 3 describes the use of the Fisher scoring algorithm for computing max­
imum likelihood estimates of the covariance parameters subject to multiplicity and 
other constraints. Asymptotic distributions of estimators of covariance parameters 
are derived under normality and non-normality. In Chapter 4, likelihood ratio tests
and Bartlett corrections are described. The results of simulations that examine the
2effectiveness of Bartlett corrections are presented in Chapter 5. Two numerical ex­
amples also are illustrated.
Principal Component Analysis
Investigators often measure or make observations on a large number of variables. 
There are several useful techniques to reduce the dimensionality of data without 
the loss of much information such as factor analysis and cluster analysis. Principal 
component analysis also is one such technique and is one of the most widely-used 
multivariate techniques.
Typically, principal component analysis is used to reduce the dimensionality of 
a data set, while retaining as much of the original information as possible. This 
is achieved by transforming the original set of variables into a smaller set of linear 
combinations called principal components. These new variables are uncorrelated and 
ordered so that the first principal component accounts for the largest proportion of 
the variation present in the original set of variables. The usual objective of the analy­
sis is to determine whether the first few components account for most of the variation 
in the original data. If they do, then these components can be used to summarize 
the data with little loss of information. This dimension reduction can be useful in 
simplifying subsequent analyses.
Before defining principal components, some notation will be described. Matrices 
will be denoted by boldface upper case letters. Vectors will be denoted by boldface
3lower case letters. A diagonal matrix having diagonal elements cti, a2, . . . ,  O71 is de­
noted by diag(ai, . , an) and the trace of matrix A  is denoted by tr(A). The 
p x  p identity matrix is denoted by I p.
Suppose that the vector of original variables y  = (yi y2 • • • Vp)' has a posi­
tive definite covariance matrix £ . For mathematical convenience and without loss 
of generality, assume that the mean of y* is zero for all i = 1 ,2 , . . .  , p .  Because S  
is symmetric and positive definite, all eigenvalues of 2  are real and positive. Let 
> A2 > • • • > Ap > O be the ordered eigenvalues of 2 and let F =  (7X 72 • • • 7 ) 
be a p x p orthogonal matrix such that
2  =  FAF', (1.1)
where A — diag(Ai, A2, . . . ,  Ap). That is, 7i is an eigenvector of 2  corresponding to 
the eigenvalue A*. The principal components of y  are the entries of the p-vector z, 
where z is the linear combination that can be written as
% =  T'y.
The covariance matrix of z, 2 Z, can be written as
(1.2)
2 Z = r'2F.
Substituting 2  from (1.1) yields
2 Z =  F7(FAFr)F =  A.
4Hence the components z\  =  7 ^ ,  % =  1J12V, ---,Zp =  7 ^  are uncorrelated. The 
variance of Zi is Xi. The maximum value of variance of 7^7/ satisfying 7^7^ =  I 
is equal to Al, the largest eigenvalue of E. This maximum occurs when 1J 1 is an 
eigenvector of S  corresponding to Ai . That is,
max I ' H l  —  7 ,1E 7 1 =  Al.i'i=i
Then, the component Zi has the largest variance Ai , Z2 has the second largest variance 
X2, and so on. The total variance of the p principal components is equal to the total 
variance of the original variables so that
y i  A* =  tr(E).
i = l
Consequently, the j th principal component accounts for a proportion
t =
tr(3)
of the total variance of the original variables.
This thesis focuses on two major aspects of dimensionality reduction:
1. Reduction in the number of components.
2. Reduction in the number of original variables.
Reduction in the Number of Components
The major problem in reducing the number of components is deciding how many 
principal components should be retained in order to account for most of the variability
5in the original data. Some rules of thumb have been suggested. Kaiser (1958) sug­
gested a cut off point to retain the components whose sample variances are greater 
than the average, i.e. greater than I for a correlation matrix. It can be argued 
that if one variable is nearly independent of all others variables, it will dominate 
a component with variance slightly less than I assuming that variables have been 
standardized. Since this variable provides independent information from the other 
variables, there is no reason to discard it. Kaiser’s rule is arbitrary and tends to re­
tain too few components (Jolliffe, 1986). Rencher (1995) suggested that investigators 
should retain sufficient components to account for a specified percentage of the total 
variance. Figures between 70 and 90 percent are generally suggested.
Three methods are in common use for determining the number of components to 
be retained. The first method is graphical and is called a scree graph. This technique 
was discussed and named by Cattell (1966). The scree graph consists of a plot of the 
ordered eigenvalues against eigenvalue numbers. With this plot, the components to 
be retained correspond to the eigenvalues plotted in the steep curve above the straight 
line formed by the smaller eigenvalues. Thus in Figure I, the first two components 
would be retained. Jolliffe (1986) suggested plotting Iog(A) rather than A. This plot 
is known as a log-eigenvalue diagram.
The second method is testing the hypothesis that the last m = p—k eigenvalues of 
S  are equal. Anderson (1963) derived the likelihood ratio test for Ho&: Xk+i =  • • • =  
Xp - X  against the alternative that at least two of the last m  eigenvalues are different.
6Figure I. Ideal Scree Graph.
Eigenvalue Number
Let yi,y2, ■ ■ ■ , V n  be a random sample of size N  = n+1 from a p-variate normal distri­
bution with mean n  and covariance matrix S . If y  can be written as Cw  + e, where 
C  is a p x  k matrix with rank k, E(e) =  0, var(e) =  a 2 I p, and var(m) = I k, then 
var(y) =  H =  C C '+ a 2 Ip. Write C C  in diagonal form as C C  =  F 1A1Iv1 and let F2 
be an orthogonal complement to F 1. That is, I p =  F 1F1TF2Fz2 and (F1 F2) G O(p),
where Oip) denotes group of orthogonal p x  p matrices. The covariance matrix of y  
can be rewritten as
S  — F1A1F1 +  (j2/p
=  F1A1F 1 +  a-2 (F1F1 + F2F2) 
= F1(A1 T a 2)Fz1 TF 2^2F'.
7Therefore, the first k eigenvalues are Xi+a2, i = l , 2 , . . . , k ,  where A1 =  (Iiag(Ai), i =  
1,2 ,. . .  ,k, and the last m = p — k eigenvalues are a2. Anderson derived the test for 
testing the equality of the last m  eigenvalues. The test statistic is
A& (1.3)
where Ii , i =  1,2, . . .  ,p, are eigenvalues of the sample covariance matrix, S. If Ho& 
is true, then the limiting distribution of - 2 In A k is where u = |(m  +  2)(m -  I). 
These results can be summarized as
Ti m in Z — in Ii
i = k + l
where I = E L m  k/m.
Bartlett (1954) improved Anderson’s test by using a multiplying factor. Bartlett’s 
test is to reject Ho& whenever
To — ( Tl — k —2m2 +  m +  2 
6m m In I — ^  ki=/c+l
■a.y’
where X i-a,w ^ e  100(1-0;) percentile of the %2 distribution.
Lawley (1956) proposed a further improvement in Bartlett’s multiplying factor. 
Lawley’s test is to reject Ho& whenever
2m2 +  m +  2 v-^ P
n ~ k ------------- =------------6m i=X
min Z — ^ 2  h
i = k + l
> X L .,,-
Acceptance of Ho^  suggests that each of the last m components contain the same 
amount of information. If these eigenvalues are very small, then little information is
8lost by discarding the corresponding principal components. The first k components 
should be retained for further analysis. A sequence of tests can be conducted starting 
with k — Q and increasing k until the null hypothesis is accepted.
Rentier and Yuan (1996) developed a method of testing the linear trend in the 
last m = p — k eigenvalues of the covariance matrix. The hypothesis of interest is 
H0m: Afc+j =  Q; +  fai,  Xi = m — i, ? =  1,2, . . . ,  m. The likelihood ratio test statistic 
is
Sup S 0)
A _  _____________
"  Sup ^ , S ) '
Ai1S
where S 0 = F0 A0 T70 and A0 =  diag(Ai,. . . ,  o; +  /ktq, a  +  Px2, . . . , a  + Pxm). 
For S 0 and S, £(//,£ ) is maximized when p, = y. To maximize £(y, S 0), Rentier 
and Yuan showed that Ai =  Ii for i = 1,2, . . . , k  are the maximum likelihood 
estimators (MLEs) of Al, . . . ,  A^  under H0m. Maximum likelihood estimates of a  and 
j3 were numerically obtained by solving the following equations:
m
E
[n lk+j — (n +  1)(q! +  /3xj)] 
(a +  P Xj)2
m
E [n lk+j -  (n +  l)(o! +  Pxj)]xj {a +  $Xj)2
0,
0.
The asymptotic distribution of —2 In Am is with v = m — 2. If all Xi are equal to
zero, then the hypothesis H0m is equivalent to Ho& as considered by Anderson (1963). 
Roik (2002a) proposed a method for modeling the eigen-structure of several co­
variance matrices simultaneously. The proposed model for the ith covariance matrix
9is as follows:
S i =  HTiAiIT', i =  1,2 , . . . ,  p,
where the eigenvector matrices (ITi)  and eigenvalue matrices (A i) may share certain 
properties. The model extends common principal components (Flury, 1988) and 
subsumes Anderson’s (1963) model, Lawley’s (1956) model and Rentier and Yuan’s 
(1996) model as special cases when g—1. Boik presented an algorithm to compute 
maximum likelihood estimates of covariance parameters and also gave likelihood ratio 
tests including Bartlett corrections. Several of his results are referred to throughout 
this thesis.
The third method is based on the cumulative proportion of total variance. If 
Ai > A2 > • • • > Ap are the ordered eigenvalues, then the cumulative proportions of 
the first k eigenvalues, and the last m = p — k eigenvalues, 5m, are as follows:
Let 4  =  p=1 (  and dm = ^ ~ fe+17 \  where Ii , i =  1,2, . . .  ,p, are defined in (1.3). 
Anderson (1963) proposed a test statistic for the hypothesis,
(1.4)
where Smfi is a known constant. If Sm is small, then little information is lost by 
discarding the corresponding principal components. Anderson showed that if Xk >
10
/W i, then Sm is asymptotically distributed as
&m) N / 0i 2 ^ E L A ?  +  2 ( l - ^ ) 2E L t+1 A?
( E L 1A)=
(1.6)
The test rejects H0m whenever |z* | > za/2, where
z* =  ^ 7 = 5 ,  (1.6)
^JV(Sm)
za /2 is the upper lOOa/2 percentage point of the standard normal distribution, and 
V(Sm) is an estimator of the variance of Sm which can be computed by replacing Xi 
by Ii in the variance of Sm in (1.5).
Sugiyama and Tong (1976) studied an approximate distribution of the cumulative 
proportion of the first k eigenvalues, <5*. Sugiyama discussed that this quantity is 
interpreted as a measure of the amount of information in the k retained components. 
Assuming that the eigenvalues are distinct, he derived a perturbation expansion for 
the distribution of Sje that is accurate to 0 (n -3/'2).
Huang and Tseng (1992) devised the following method of selecting the number of 
components to be retained. Let J2i=i \ /  J2i=i =  $k and
J"i<jf(w) — {A; Sg > to),
where w i s a  fixed constant and q is an integer less than p. Their decision rule is to 
select k components, where k is the smallest integer that satisfies
4  > c,
11
where c is a fixed constant to be determined. They discussed how to determine N  
and c such that the minimum probability of retaining the important components is 
at least some specified level P*, 0 < P* < I. That is,
min I  inf Px(6k > c ) l  > P*. (1.7)
Huang and Tseng showed that if w > 0.5, k < | ,  and N  and c satisfy
c < w{l -  (I -  u)\ j2p/(p — I)(AT — 1)$“1(P*)},
where <!>(x) is the c.d.f of the standard normal distribution, then (1.7) holds.
In this thesis, a likelihood approach for making inferences about the cumulative 
proportion of total variance is given. The hypothesis of interest is
Cv \
Ho : t e p ) =Co' (L8)
where Ci  is a matrix of known constants and C0 is a vector of known constants. 
This hypothesis is more flexible and general than the hypothesis in (1.4). If Ci  is 
(O7fe Ip-*,)' and c0 is a scalar, then the hypothesis in (1.4) is subsumed as a special 
case of the hypothesis in (1.8). Specific goals are to obtain maximum likelihood es­
timates (MLEs) of covariance parameters, to construct a likelihood ratio test of the 
hypothesis, and to construct confidence intervals for The MLEs of parameters 
and a likelihood ratio test for the hypothesis in (1.8) are given using one of the param- 
eterizations of eigenvalues presented in Chapter 2. The parameterizations allow for 
arbitrary eigenvalues multiplicities. A Bartlett correction is derived to improve the
12
likelihood ratio test. The tests are inverted to obtained confidence intervals. The like­
lihood ratio intervals and Bartlett-corrected intervals are compared with Anderson’s 
intervals inverted from (1.6) by means of simulations. The results of the simulations 
are described in Chapter 5.
Reduction in the Number of Original Variables
A principal component is a linear combination of all of the original variables. 
Thus, interpretation and subsequent data analysis involve all of the variables even 
if some components are discarded. Not only is it useful to reduce the number of 
components but it also is useful to reduce the number of the original variables. If the 
first k principal components have zero coefficients on a set of original variables, then 
those variables are redundant. These redundant variables can be eliminated from the 
study and need not be collected in any subsequent studies. The term “redundancy” 
has been employed in other contexts , e.g. multivariate regression (Lazraq, 2001), 
discriminant analysis (Pujikoshi, 1985), canonical correlation (Van Den Wollenberg, 
1977; Gleason, 1976), and growth curve analysis (Pujikoshi and Rao, 1991). The 
term takes a different meaning in each of these contexts. In this thesis, the principal 
components of y  are defined as in (1.2). It follows that
y  =  Pz.
The coefficient for obtaining the ith variable from the j th principal component is 7^ 
and is called a loading. A redundant variable is defined to be an original variable
13
whose loadings on the first k principal components are zero.
Two approaches for determining the number of variables and which variables to 
retain will be discussed, namely the rejection approach and the hypothesis testing 
approach. A rejection approach is a set of cut-off rules for choosing how many and 
which variables to be retained. A hypothesis test is a method of inference to decide 
which of two complementary hypotheses is true. The rejection approach will be 
described first.
Beale, Kendall and Mann (1967) developed cut-off rules that specify the number 
of variables to be discarded. However, they focused primarily on multiple regression 
analysis rather than on principal component analysis. In multiple regression with p 
independent variables, they suggested selecting the set of r variables that maximizes 
the multiple correlation of the dependent variable with the r independent variables. 
An extension of this rule to principal components is to retain set of r variables which 
maximizes the minimum multiple correlation between r selected variables and any 
of p — r discarded variables. They also mentioned another rejection approach for 
use in principal component analysis. With all p variables, p principal components 
are obtained. If pi eigenvalues are smaller than some number, A0, then the last 
Pi components are inspected. Starting with the one corresponding to the smallest 
eigenvalue, then the next component corresponding to the second smallest eigenvalue 
and so forth, the variable that has the largest coefficient in the component and has 
not already been deleted by a previously considered component is deleted and the
14
number of variables is reduced fromp to p—P1. Another principal component analysis 
is done on the remaining p — pi variables. Similarly, if p2 eigenvalues are smaller than 
the A0, P2 variables associated with the largest coefficients in the last p2 components 
are rejected. The process continues until all eigenvalues are greater than the A0. The 
number of variables is reduced from p to r =  (p -  Pi -  • • • — p*). The value of r 
is determined by choice of A0. Jolliffe (1972) discussed the appropriate A0 based on 
simulation studies.
Jolliffe (1972) discussed eight rejection methods for deciding which variables to 
be discarded. The eight rejection methods were divided into three groups: namely 
. multiple correlation methods (Al and A2), principal component methods (BI, B2, 
B3, and B4), and cluster methods (Cl and C2). These methods are briefly described 
below.
o Multiple Correlation Methods
— Method Al is the method of Beale, Kendall and Mann (1967).
— Method A2 is a stepwise method. In multiple regression with p independent 
variables, a subset of independent variables is selected , in each step, such 
that the variable having maximum multiple correlation with the remaining 
variables is deleted. The process is repeated until r variables are retained.
o Principal Component Methods
— Method BI also is based on Beale, Kendall and Mann (1967) using PCA.
15
— Method B2 is the same as Method BI except that only one principal com­
ponent analysis is done. If k variables are to be retained, then the last p—k 
components are inspected. For each of these p — k components, starting 
with the last component, the variable that has the largest coefficient in the 
component and has not already been deleted by a previously considered 
component is deleted.
— Method B3 uses the last p — k components. The sum of squares of coeffi­
cients of each of the p variables in the last p — k components are computed. 
The p — k variables that have the largest sum of squares are rejected.
-  Method B4 uses the first k components. For each of k components, starting 
with the first component, the variable that has the largest coefficient in the 
component and has not already been selected by a previously considered 
component is selected. Variables that are not selected are rejected.
o Cluster Methods
-  Method Cl is a single-linkage method (Seber, 1984). A measure of sim­
ilarity between two clusters of variables X  and Y  is defined by rxy such 
that
Txy  =  max Vi j , (1.9)
where is the correlation coefficient between variable i and j . For each 
of p(p -  l ) /2 pairs of p variables,,rxy are computed. Two clusters having
. the maximum rxy are combined into a single cluster. For the new set of
16
clusters, the process returns to calculate rxy and so forth. The process 
continues until k clusters of variables remain. The principal component 
analysis is based on k variables, one chosen from each cluster. Jolliffe 
described 3 ways to select a variable from each cluster.
-  Method C2 is an average-linkage method (Seber, 1984). It follows the same 
steps as Method Cl except that rxy in (1.9) is replaced with
_  YUex ^ j e Y  rij
where pi ,p2 are the numbers of variables in X  and Y,  respectively.
These rejection methods were tested on artificial data. The artificial data were con­
structed so that certain variables were linear combination of other variables. For 
example, if P i = P 2 +  e, where e is a random disturbance, then either pi or p2 may be 
discarded without loss of information. It was shown that several rejection methods 
eliminated precisely those certain variables. In general, Method A2 is the most con­
sistent method because it only retained good or best subsets. Method B4 retained 
best subsets more often than Method A2 but it retained moderate or bad sets fairly 
frequently.
JollifFe (1973) discussed the five methods, Al, B2, B4, Cl, and C2 which had 
been successfully used on artificial data in Jolliffe (1972). He applied these methods 
on four sets of real data. The methods were equally effective with real and artificial 
data, although none of rejection methods was apparently better than the others.
McCabe (1984) proposed methods to select a subset of variables called principal
17
variables that contains as much information as possible. McCabe gave four optimality 
criteria to evaluate all possible subsets. These criteria are based on the conditional 
covariance matrix of variables not selected, given those selected. Let S  be partitioned 
as
y> _  f  ^ l l  ^12
\  S21 S 22
where S ii is the covariance matrix of the selected variables and S 22 is the covariance 
matrix of the discarded variables. The conditional covariance matrix of the variables 
not selected given those selected can be written as
5122.i — S 22 — S 2i S 111S i2.1 Zj I Zj1
Let 9i, i = l , 2 , . . . , p  — k r be the eigenvalues of S 224 and Ai, « =  1,2, . . .  ,p, be the 
eigenvalues of S. McCabe proposed the following four criteria:
I. min IS224I =  m inH L f  $i,
2. min Lr(S224) =  min Y h=i &
3. min ||S 224||2 =  min
4. max Pi = max Ir(S 111S 12S 221S 21),
where pi are canonical correlations between the variables not selected and those se­
lected. The percentage of variance explained by a set of k principal variables is
' - ' - S S
100% .
18
This percentage can be compared to the variation explained by the first few principal 
components.
Jolliffe and Cadima (1995) showed that discarding variables that have a small 
loading is not appropriate in various respects. They examined the effects on a single 
component if one or more variables are eliminated. They did not examine the effects 
on a subspace spanned by several components simultaneously. Jolliffe and Cadima 
implied that the correlation between the ith variable and the j th principal component, 
Pij, is an appropriate quantity to examine. This quantity can be written as
P ij  =  l a
U ' 1/2
where -%• is loading for the ith variable in the j th component, Ij is the eigenvalue 
associated with that component, and s? is the variance of the ith variable. They used 
an example to show that similar coefficients, -%■, even very large ones, may translate 
into very different correlations between those variables and principal components or 
very different coefficients may be associated with similarly correlated variables and 
principal components. Jolliffe and Cadima examined the distance between the prin­
cipal component and the truncated principal component. The truncated principal 
component is the principal component in which variables with small loading are ig­
nored. The distance between the principal component and the truncated principal
component can be expressed as
\ \x%  -  I
11X7,11 =  I -  27, 7 , +
, 7 j 'S t ^ V/3
5
19
where 7* is the sub-vector of 7^ which results from retaining only the k coefficients 
associated with the variables that were retained, Sk is the sample covariance of the k 
selected variables, and Xk  is the sub-matrix of a n x p column-centered data matrix, 
X ,  obtained by retaining only k of its columns. The last term of d shows that select­
ing the k variables with the largest loadings is not guaranteed to yield the smallest 
distance.
The second approach for determining the number of variables is hypothesis test­
ing. Anderson (1963) gave the asymptotic distribution of the sample eigenvalues and 
eigenvectors when the population eigenvalues are equal in sets. Anderson considered 
the hypothesis Ho: 7x =  7 0, where 7% is the eigenvector of the first principal com­
ponent and 7 0 is a specified vector such that 7o70 =  I. If the hypothesis is true, 
then
rc[ii7o'Sf_17o +  ^r1To-S1To ~ 2]
has the limiting Chi-square distribution with degree of freedom p — I, where Z1 is the 
first eigenvalue of the sample covariance matrix, S.  Similar results can be applied for 
any other eigenvector corresponding to an eigenvalue with multiplicity I.
Tyler (1981) modified and extended Anderson’s results. Tyler derived asymptotic 
procedures for testing more general hypotheses concerning eigenvectors. Let H  be a 
p x p matrix and W  be a p x p positive definite symmetric matrix such that W H  
is symmetric. These conditions ensure that the eigenvalues and eigenvectors of H  
are real. Let Ai > A2 > • • • > Ap be the ordered eigenvalues of H  and let A  be a
20
p x g matrix with rank q. Lastly, let w be a subset of m  integers (I < m < p) from 
{1,2, . . .  ,p} such that Aj ^  Xj for all i G w, j  ^ w. Tyler considered the following 
two hypotheses: 
for q <m ,
H0 : The columns of A  lie in the subspace generated by the set of eigenvectors
of H  associated with m  eigenvalues, Ai, i £ w, and
for q>m ,
Hq : The eigenvectors of H  associated with the m  eigenvalues, Aj, i G w, lie
in the subspace generated by the columns of A.
Let f31} f32, ■ ■ ■, Ppbe  the eigenvectors of H  and let bi, b2, . . . ,  bp be the eigenvectors 
of H n, where
V^vec(Hra - H )  A  N(0, S).
Denote the p x m matrix whose columns are Pi for i £ w by Pw and denote the p x m 
matrix whose columns are bj iov i £ w by Btu. Let di > d2 >■■■> dp be the ordered 
eigenvalues of H ra, P 0- =  P ^ P 1wPwY 1P1w, and P 0 =  E ieiy A  =  Bto(BJ0Bw) -1BJu. 
Then the test statistic for H0 is
Tra(A) =  n  {vec [(Jp - P 0)A ]}' [s(A )] vec [(Jp - P 0)A] , (1.10)
21
where
S(A) = (A'& fp)c;,s Cw(AeJp),
Cfra = 53  2 3  K  ~ A j Y 1P i  ®  p ' j -
itzW j f w
Tyler showed that, under H0, Tra(A) X%>-m)q- The test is to reject H0 if r  =  
rank(P 0 A) < $ or if r  =  g and Tra(A) > Xi-a,(p-m)q- The hypothesis Hg can be 
tested by using the following approach. Let B  be a fixed p x ( p - q )  matrix with rank 
p — q whose columns are orthogonal to A, i.e. A 1B  — 0. The hypothesis Hg can be 
rephrased as
Ho : The columns of B  lie in the subspace generated by the set of eigenvectors
of H 1 associated with p — m  eigenvalues, A%, i ^  w,
and can be treated as H0.
The test statistic can be simplified if a random sample has been drawn from an 
elliptical distribution. Denote the covariance matrix by H.  Let k  be an estimator of 
k, the kurtosis parameter. An estimator of the covariance matrix of Vec(JJra) can be 
written as
S  =  (I +  k)(Ip2 + J(p,p))(JJra O H n) +  it Vec(JJra)[Vec(JJra)]'. 
The test statistic in (1.10) can be simplified as
Tra(A) =  n(l  +  k) Y i Aj1 t i  [ A 1P iA l A 1 B wDjB 1wA ] ' 1} ,
itw
(i.n )
22
where Dj is an m x m diagonal matrix with entries di/(di — dj)2, j  w. For 
g =  m =  I, Tn(A) is asymptotically equivalent to the statistic given by Anderson
(1963).
Flury (1986) discussed the test for redundancy of variables in the comparison of 
two covariance matrices. Let S 1 and S 2 be the covariance matrices of two indepen­
dent p-vectors. Flury analyzed the eigenvectors of S ^ 1S 2. Let {(3j}Pj=1 be a set of 
eigenvectors of S ^ 1S 2 and partition {/3J}^ =1 into the first p — q and the last q rows 
as follows:
W P _  J=I ~~ 0 il/3j2
P
J=I
The hypothesis of interest is
H0(p, q) : Pj2 = 0 for all j  G w. (1.12)
This hypothesis can be formulated in the form Hg of Tyler (1981), by putting A  =
( I r- ,  0 ')'.
Then by using
S  =  (  J9 )  ’ (L13)
the hypothesis can be rephrased as
H0 (p, q) : The columns of B  lie in the subspace generated by the eigenvector a ;,
(i ^ w) of S 2S ^ .
23
The test statistic in (1.10) can be applied. Let {bj}^=1 be a set of eigenvectors of 
S i 1S 2 and partition {bj}^=1 into the first p — q and the last q rows as follows:
{M t= i =  {-  J bHbJ 2 J=I
The test statistic is
-R(P, q)
kidjj T k2didj 
(di -  dj)2
b^b i2
- i
b J 2)
where =  limnj_>O0 for i =  1,2 and dj, j  = 1 ,2 ,... ,p, are the eigenvalues of S 2S 11. 
Under H0(p, q), R(p,q) Xhg-
Fujikoshi (1989) reviewed the problem of testing the hypothesis of redundancy in 
various multivariate situations including principal component analysis. Let {7^ 
be a set of eigenvectors of E  and partition {7-/}j=1 into the first p -  q and the last q 
rows as follow:
p
The hypothesis of redundancy of the last q variables in the first k components can be 
written as Ho (A;, q) : 7 ^  =  0 for all j  6 w, w = {1 ,2 ,.. . ,  k}, which is equivalent to
H0 (A:, q) : The columns of B  lie in the subspace generated by the eigenvectors
Ti) (* ^ w) of E, (1-14)
where B  is defined in (1.13). Fujikoshi used the test statistic in (1.11) to test the 
hypothesis. Let k and 7^ i =  1 ,2, . . .  ,p, be the eigenvalues and eigenvectors of S,
24
respectively. The test statistic can be rewritten as
Tn{B) =  n ^  (l l l S)
where under Ho(A,g), Tn(B) -A
Schott (1991) also employed the test statistic in (1.11) to test the hypothesis 
that in each of the first k components have zero loadings on the last q original 
variables. This hypothesis is equivalent to (1.14). The test is to reject H0(A:,q) if 
r — ran k (rr 'B ) < g, or if r  =  g and
T*,, = > x L „ ,ia, (1.16)
j£w
where B  is defined in (1.13) and Dj is an m x m diagonal matrix with entries 
h/ (k—lj)2, j  ^ w, w = { 1 , 2 , ,  k}. The statistic T^q in (1.16) is a matrix expression 
for Tn(B) in (1.15). A Bartlett adjustment was proposed based on the idea that if a 
test statistic T  has a mean which can be expressed as
E(T) = o { l  +  ^  +  0 (n -3/2)} , (1.17)
where a is the asymptotic mean and c is a constant, then the mean of adjusted 
statistic,
T * = ( l -  T, (1.18)
approaches a. It follows from Tkiq that the asymptotic mean of Tkiq in (1.16)
is a =  kq. An expression for c can be found in Schott (1991). Schott compared the 
unadjusted statistic, Tkiq, and adjusted statistic, T*, under several conditions using
25
simulated data. He showed that, in general, the Bartlett adjusted statistic performs 
better than Tktq with respect to Type I error.
Diimbgen (1995) proposed a likelihood ratio test for principal components of a 
matrix S . Let f2 be the set of all symmetric matrices in Rpxp and 0{p) be the 
set of all orthogonal matrices in Rpxp. For S  G fi, let F G 0{p) such that F 'S F  =  
A, where A =  diag(Ai, A2, . . . ,  Ap) and Ai > A2 > • • • > Ap. Let m  =  (mi, m2, . . . ,  ma) 
be a partition of {1,2,... ,p} into o > I sets. The hypothesis of interest is
Ho: S  =  FAF',
where F =  ®-L1Fii, A =  GiL1Aii, Aii =  diag(Ai) for i  G mi, Fii is an Tni X m i  matrix, 
Fii G Oimi), 5j)“=1 Tni =  p, and ® stands for the direct sum. Let h > k  > • • • >  Ip 
be the eigenvalues of S  and define
/x =  arg min \\u -  Am(5)||2,z/6X(n)
where \{ S )  is the vector of the ordered eigenvalues of S , and
Am(S) =
 ^ A(Si^ )iJgmi  ^
A(Siy)iJgma
X A(Sy)iJgms J
The exact computation of is described in Diimbgen (1995). The likelihood ratio 
test is to reject H0 whenever tk(S) < c{a), where
tkiS) -  Y M ^ ) .
i=l
26
The test statistic requires a simulation to obtain critical values.
In this thesis, a likelihood approach for making inferences about the covariance pa­
rameters is proposed and is applied to redundancy test. Let A =  diag(Ai, A2, . . . ,  Ap) 
be the diagonal matrix of the eigenvalues of S  and T* be a set of the corresponding 
eigenvectors. The subspace spanned by the columns of a matrix, T , is denoted by 
TZ(T). Let A  be a known p x q semi-orthogonal matrix with rank q, and M*  be a 
p x m  semi-orthogonal matrix such that TZ(M*) spans the subspace generated by m  
specific columns of I v. The hypotheses of interest are
Ho : A G Tl(TstM ie) for q <m , and (1.19)
Hq : T ieM ie G Tl(A) for q>  m. (1.20)
It follows from (1.20), see Tyler (1981), that
r*M* G Tl(A) <=> A=E Tl(T*M*c),
where Ac and.M"*6 are orthogonal complement to A and Af*, respectively. That is, 
A cA cl = Ip - A A  and M*c M*0' — Ip Accordingly, the hypothesis Hg in
(1.20) can be rewritten and treated as the hypothesis H0 in (1.19). Only hypothesis 
Hg in (1.19) is employed in the remainder of this thesis.
To match the theoretical setup in the later chapters and without loss of generality, 
M*  can always be equated to the first m  columns of Ip. This simplification is possible 
because the columns of T* and the rows of AF  can be permuted to satisfy
Am
oTl(M*) = Tl
27
If H{M*) then replace T*M* by TM , where T = T*P  and M  =
P 1M*, where P  is a p x p permutation matrix. Choose P  so that
K(P 'M *)  I .
In the remainder of this thesis, hypothesis (1.19) will be written as
H0: A g n {TM ),  where H(M)  = n (  (1.21)
This hypothesis is general and the hypothesis of redundancy is subsumed as a special 
case. In Chapter 3, an algorithm for computing MLEs of the covariance parameters 
satisfying multiplicity and other constraints is proposed under H0 in (1.21). A likeli­
hood ratio test and a Bartlett correction for the test of (1.21) are also developed in 
Chapter 4.
The hypothesis of redundancy can be written as (1.21) in the following manner. 
First, the eigenvalues will be reordered so that Ai < A2 < • • • < Ap. This order is used 
throughout the remainder of the thesis. The rational for reordering the eigenvalues 
is described in Chapter 2. Let F be a matrix whose columns are the corresponding 
eigenvectors. Then, by equating A- to A =  (Iq O1)1, the hypothesis in (1.21) matches 
with the hypothesis in (1.14). This hypothesis states that each of the first k compo­
nents has zero loadings on the last q original variables. The redundancy hypothesis is 
identical to that tested by Fujikoshi (1989) and Schott (1991) but it differs from the 
variable selection approaches of Jolliffe (1972) and McCabe (1984). To understand
28
the distinction, first note that it follows from (1.2) that y  =  Tz. Partition y  as
where y i is (p — q) x I and y2 is q x  I. Partition z as
z  = Zi
Z2 J
where Zi is k x I and Z2 is (p — k) x  I. The last q variables are redundant if their 
loadings on the first k principal components are zero. That is, the last q variables are 
redundant if
r K rO1 r :  )  ■ P-22)
It follows from (1.2) that
var y
Z
= var Ipr' y  =
TAT1 PA \
Ar' a  J ’
where T is defined in (1.22). The conditional covariance of y2 given Z2 can be written 
as
var(y2N )  =  var(y2) -  cov(r/2, ^ jv a r (Z 2)) 1Cov(Z2l^2)-
If the last q variables are redundant, then
var(t/2|z2) =  F22A2Pg2 — P22A2A21A2P22 =  0.
That is, if y2 is redundant with respect to Zi then var(y2|z2) =  0.
On the other hand, Jolliife (1972) and McCabe (1984) reduced the number of
29
variables in such a way that the variables can be discarded if all linear functions of y  
can be approximately reproduced as linear combination of . It can be shown that 
satisfaction of the constraint (1.22) does not guarantee that any of McCabe criteria 
(see page 17) are optimized. For example, if F satisfies (1.22), then the conditional 
covariance of j/2 given can be written as
322.1 = (F% ,A 2% ) \
which can have norm as large as ( r 22A^1Fz22) =  tr(A2) = (p — &)A2 if all
eigenvalues in A2 are equal to A2. Conversely, ||X22.i||2 can be small even though
redundancy is not satisfied. For example, suppose that y  =  ^ y  + e y ’ w^ere 
var(yi) =  Xn , var(e) =  aXn , and a is a scalar. The covariance matrix of y  can be 
written as
X = <8> Xn .I l +  o
Let F n  be the matrix of eigenvectors of Xn . Then, the eigenvector matrix of X, F,
can be expressed as
F =
Z X
l  +  \ A + f  I - V L T f )
The conditional covariance of y2 given y i can be written as
® rn .
X22.i — (I +  o)Xn  — Xn X111Xn — oXn .
McCabe criteria can be satisfied by choosing o to be a small number. If a is close to 
zero, then F approaches
r - i
30
which does not satisfy the redundancy constraint in (1.22). Accordingly, the criteria 
of Jolliffe and McCabe were not constructed to reduce the number of variables with 
respect to redundancy.
The likelihood ratio test for redundancy is compared with Schott’s tests in (1.16) 
and (1.18) using simulated data. A simulation study of the effectiveness of the Bartlett 
correction is described in Chapter 5.
To illustrate the dimensionality reduction problem, two examples are described 
below.
Example I
A large number of variables were observed in a study of the quality of pictures 
produced by a photographic process. The data were originally presented by Jackson 
and Morris (1957) and were discussed by Schott (1991). The procedure for a check 
on the process was as follows: A film strip was given a graded series of exposures to 
white light and was processed. Optical densities were measured through red, green, 
and blue filters at the high-density portion of the characteristic curve (shadow areas), 
at the middle-tone portion of the curve (average picture density) and at the toe 
portion of the curve (highlights, whites, etc.). There were p = 9 measurements: three 
density levels and three colors at each level. The sample covariance matrix based on 
N  =  109 was given in Jackson and Morris (1957). The eigenvalues and cumulative 
proportions of total variance are given in Table I. The eigenvectors of the first two
31
Table I. Eigenvalues of Covariance Matrix: Photographic Example.
Component Eigenvalue
Cumulative proportion 
of total variance
I 878.52 0.6092
2 196.10 0.7452
3 128.64 0.8344
4 103.43 0.9062
5 81.26 0.9625
6 37.85 0.9888
7 6.98 0.9936
8 5.71 0.9976
9 3.52 1.0000
Table 2. Eigenvectors of the First Two Principal Components.
Eigenvector
Component High density Middle tone Toe portion
red green blue red green blue red green blue
I 0.305 0.653 0.483 0.261 0.324 0.271 0.002 0.006 0.014
2 -0.486 -0.151 0.588 -0.491 -0.038 0.373 0.057 0.054 0.088
principal components are presented in Table 2. The information in these tables leads 
to the following two questions:
1. How many components should be retained?
2. How many and which variables can be considered as redundant? Consider 
the first two principal components. These components account for .75% of the total 
sample variance. Table 2 reveals that first two components have small coefficients 
on the last three variables. The question is whether these three variables can be
discarded. These two questions have been addressed in several papers referenced
32
earlier. This thesis proposes an alternative method for answering these questions. 
This method is based on a likelihood approach and is discussed in Chapters 2-4.
Example 2
The 1984 Olympic track records dataset for women was examined. The data 
were first analyzed using PCA by Dawkins (1989). There were p = 7 track events; 
namely, 100 meters, 200 meters, 400 meters, 800 meters, 1500 meters, 3000 meters, 
and marathon (42,195 meters). There were 55 countries having complete records for 
all events. The data consist of time to complete each event by athlete from each 
country. Dawkins normalized the data to have mean 0 and standard deviation I and 
then analyzed it. Naik and Khattree (1996) discussed an alternative choice to com­
pare the athletic performance of nations. Instead of using time as the response, speed 
defined as distance per unit time was used. Speeds for the women track events and 
coefficients in the first two principal components are presented in Naik and Khattree 
(1996). In the present context, suppose that it is believed that variability in speed 
over track events can be summarized in terms of linear, quadratic, cubic, and higher 
order functions. A profile plot of the expected mean speed against log(distance) ap­
pears in Figure 2. It is of interest to examine the polynomial components of the 
response profile. In this example, the matrix C* containing the coefficients of orthog­
onal polynomials is
33
Figure 2. A Profile Plot of the Expected Mean.
9 T
8.5 '  
8 -
7.5 -
s 7"
6.5 - 
6 -
5.5 ■ 
5 ■
4.5 -
4F- 2.5 3 3.5
log(distance)
4.5 5
f  -0.4742 0.5426 -0.4769 0.2996 -0.1380 0.0402 \
-0.3332 0.1609 0.2878 - -0.5738 0.5053 -0.2298
-0.1921 -0.1229 0.4780 - -0.1101 -0.5214 0.5410
-0.0511 -0.3087 0.2700 0.4374 -0.1839 -0.6791
0.0768 -0.3924 -0.1152 0.3879 0.5930 0.4264
0.2179 -0.3914 -0.5938 --0.4752 -0.2612 -0.0995
^ 0.7558 0.5120 0.1502 0.0341 0.0062 0.0008 )
The first column of C* contains coefficients of the linear contrast. The second column 
of C* contains coefficients of the quadratic contrast etc. The matrix C* can be 
computed by using the Matlab program given in Appendix C.
The eigenvalues of C*'SC* and cumulative proportions of total variance are given 
in Table 3. The eigenvectors of the first two principal components are presented 
in Table 4. Two questions of interest are addressed. First, how many principal 
components should be retained? The first component accounts for 55%. The first 
two components account for 81%. Second, which polynomials contribute to the largest
34
Table 3. Eigenvalues of Covariance Matrix: Women’s Track Example.
Component Eigenvalue
Cumulative proportion 
of total variance
I 0.1246 0.5509
2 0.0586 0.8102
3 0.0240 0.9164
4 0.0091 0.9566
5 0.0061 0.9836
6 0.0037 1.0000
Table 4. Eigenvectors of the First Two Principal Components.
Component
Eigenvector
linear quadratic cubic quartic quintic septic
I -0.9784 0.0065 0.1738 -0.0764 -0.0783 -0.0229
2 0.0316 0.8985 0.0666 -0.3833 0.2005 0.0059
principal component. The data will be analyzed using the approach in Chapters 2-4.
The conclusion of this example will be discussed in Chapter 5.
35
CHAPTER 2
PARAMETERIZATION
In this chapter, two parameterizations of eigenvalues and one parameterization 
of eigenvectors are presented. Expressions for the first three derivatives of each of 
the parameterizations are given in Theorems 1-8. These theorems are proven and the 
results are employed in Chapter 3 to estimate parameters. To make the theorems 
distinct from the text, theorems and proofs will be set in italic letters. The notation 
and terminology used here closely parallel that of Boik (2002a, 2002b).
Let S  be th ep x p  covariance matrix with eigenvalues Ai , . . . ,  Ap and suppose that 
7 1, . . . , 7p is a set of orthonormal eigenvectors corresponding to these eigenvalues. 
Then, the spectral model for the covariance matrix is as follows:
where A — diag(Ai, . . . ,Ap) and T =  (7i 7% - -  7P)-
Let pt and cp be vectors of unknown parameters. The quantities, T and A, are 
parameterized as functions of /i and (p such that T =  P (/Li) and A =  A(<£>). The 
parameters are arranged in a vector, 0:
Spectral Model
S  =  TAT (2 .1)
36
where dim(-) stands for the dimension of vector and =  za
Parameterization of Eigenvalues
Denote the p x I vector of eigenvalues of S  by A. Then,
p p
j = i
j=i
where e? is the j th column of I p.
Taking the vec of both sides yields
vec(A) =  ® Bpj )Xj
j ' = i
= B e ?  ® 4 ) < a .
j=i
Accordingly, the relationship between A and A is
vec (A) =  L \  and A =  ^vec(A),- (2.2)
where
p
i: =  ]T (e ; ® e^)^' and ^  =  Jp. (2.3)
j ' = i
Two parameterizations of A are used in two applications of the dimension reduc­
tion in principal component analysis. These two structures for A are described in
each of the applications below.
37
Reduction in the Number of Variables
It is assumed that A is a differentiable function of cp, where y  is a zv2 x I vec­
tor of unknown parameters. Boik (2002a) introduced several possible structures for 
eigenvalue parameterization. The structure used here is as follows:
X = T1 exp{©(T2y)}, (2.4)
where T1 and T2 are full column-rank eigenvalue design matrices with dimensions 
PXq2 and q2 x v2, respectively and © indicates that the operation is to be performed 
element-wise. For example, if it is a p x I vector, then exp (©it) =  (eUl e“2 • • • eUp)'. 
The following examples show how to construct T1 and T2.
Suppose that A contains u2 distinct values. Let m  =  (To 1 m2 • • • Tnv2)' be the vec­
tor of eigenvalue multiplicities, where Y^iLi 77L =  P- For example, if A =  (10 10 5 I I)' 
then m =  (2 I 2)'. The parameterization in (2.4) allows arbitrary eigenvalue multi­
plicities. Consider the p x I vector of A with multiplicity m  and
( A1 0 > ( Im1 S x p (^1) o XA2 " , -Tm* expfpg)
V 0 Ap j V 0 Imv2 SXp(P1Z2) y
If the distinct eigenvalues are unordered, then T1 and T2 can be written as
"2
T1 =  @  Imi and T2 =  I U2.
Z=I
38
To order all the eigenvalues from small to large when m  = I v and p =  4 =  z/2; T1
and T2 can be written as
Ti
(  I 
I 
I
V i
0
1 
i 
i
o X 
o 
o 
i /
and T2 =  I u (2.5)
To order all the eigenvalues from small to large when m  =  (I I 2)', p —
4, and z/2 3; T1 and T2 can be written as
/  I 0 0 \
I I 0Ti = and T2 =  I.Z/g'I I I
V i i i
To ensure that the first eigenvalue is the largest when m  
set T1 and T2 as
I p and p =  4 =  z/2;
Z I 0 0 o X
I - I 0 0
I 0 - I 0
V 0 0 - I )
and T2 — I U2 •
To order all the eigenvalues such that Ai =  exp{a +  /3(p -  %)}, % =  1,2, 
where a  and /3 are constants; T1 and T2 can be constructed as
,P,
T1 =  Jp and T2 =  ( lp p),
where p is  a vector with elements p — i, i =  1 ,2, . . .  ,p.
To order all eigenvalues in Example I (the study of the quality of pictures) so 
that the last eigenvalue is the largest, the second last eigenvalue is the second largest, 
and the first seven eigenvalues have no order but are smaller than the last two; T1 
and T2 can be constructed as follows:
/  i? o7 - J 7 A
T1 =  I I 0 O7 I and T2 =  Jg.
V i i  o' y
39
Expressions for the first, second, and third derivatives of A in (2.4) with respect 
to y/ are given in Theorem I. These results are useful for estimating parameters and 
for constructing likelihood ratio tests including Bartlett corrections as explained in 
Chapters 3-4.
Theorem  I  (Boik 2002a, in supplem ent). The first three derivatives of A in 
(2-4) with respect to ip' can be written as follows:
D y ' = =  T1 d i ag [exp { G (T, J'} ] T2.
D f  =  =  {exp(0T2y) ® T1Y  W<2>(T2 8  T2),
dtp1 ® dp '
^A
dp '  ® dp'  ® dp' {exp(02%<p) ® ® 21 ® 2^),
where
92 92
and
i = l
Proof. Expressing A in (2-4) in term of summation yields
A =  2 ^ e f  e x p M ^ y } .
i = l
Taking the derivative of A with respect to p ' yields
D f  = T1^ e f  exp{ef'T2<y,}ef'T2 (2.6)
=  2idiag[exp{©(22<p)}]22,
diag[exp{©(22<£>)}] =  e f  exp{ef'T2^ j e f '.
where
40
Taking the derivative of (2.6) yields
92 n
D i?  =  T i £ > f — [exp{e?'T2V}ef'T2]
2=1 *
Q2
=  T1 ^  ef={ exp{ef='r2y}ef-'T2( /„  ® ep'Tz)}.
2=1
Expressing exp{ef2,T2y>} as e |2,exp{©(T2^j)} yz'e/flLs
92
U f  =  ^ T 1E f e f  exp{©(T2¥.)} (e f '® e f ) (T 2® T2)
*=1
92
=  5 3  (exp{0(T2f)} 'ep  ® T1) e f  ( e f  ® ef*)(T2 ® T2)
i = l
92
=  ]T  (exp{0(%i<p)}' 8  0%) (er 8  e f )  (e:f' 8  ef')(]% 8% ),
2=1
where ( e f 2 0  I gJ e f2 =  (e f  0  e f ).
Simplification yields
92
D f  =  (exp{©(T2¥,)} ® Tl)' 5 3 ( e f  e f ' ® ef= e f  )(T2 ® T2)
2=1
=  (exp{0({%<p)} 0  ® Is).
Similarly, the third derivative can be obtained as follows:
92 o
D f  =  T1^ e f - (exp{ef'T2V}(ef'T2® ef'T2))
2=1 ^
92
=' T1 5 3  e f  { exp {ef T2<p}ef'T2(Ts ® e f 'T2 ® e f 'T2))
i=l
92
=  T i e f e f  e x p { 0 (^ y ) } ( e f  0  e f  8  e f ) (% 0 0  %,)
2=1
92
=  Y  (exp{0 ( l 2<f)}'ef 0  H)  e f  ( e f  0  e f  0  e f )(% 8 8 1 )^
2=1
92
=  Y  (exp{0(]%<p)}' ® (e f  0  e f  ) ( e f  0  e f  0  e f )(:% 0  T; 0  T;)-
2=1
41
Finally,
D (3)A (exp{0(2%<p)} 0  ® ® 0  ^  ®
i = l
92
(exp{0(2%%,)} 0  ® ^  0  ^ ) .
□
Reduction in the Number of Components
To make inferences about the proportion of variance that the smallest m  com­
ponents account for, the eigenvalues are parameterized to satisfy the following con­
straints:
IpA =  yo, where =  tr(E), and
C[X
y’o
— Co,
(2.7)
(2 .8 )
where Ci is a matrix of known constants and C0 is a vector of known constants. The
following parameterization is employed:
_  yoT3exp{Q(T4)}
i;# exp {0 (I% )} '
where £ =  £(<£>) is a differentiable function of ip, T3 and T4 are full column-rank 
design matrices of known constants with dimensions p x ^3 and % x g4) respectively, 
dim(y>) =  ^2, and ^2 < Qi- It is assumed that (2.8) is consistent. That is, 3 £ such 
that =  Co-
The vector of eigenvalues in (2.9) depends on T^(T4), the vector space generated 
by the columns of T4. If I 93 G T^(T4), then T4 in (2.9) can be reduced to a matrix
42
that has one fewer column than the original matrix in the following manner. Let 
P1 =  Ppo(I93) =  I 93^ T1I 93 and write T4£ as
T& = (Z93 -  PX)TA  + P1TA
= (Zg3 — P l)TA  +  Ig3QrS Ig3Z ^-
Substituting in (2.9) yields
X _  ^pZ3 exp{Q((Z93 -  P i )T4^ )}
l ^ e x p { 0 (Z 93- f i ) l k ( ) }
because exp(g^1 I z93T4^ ) is a scalar and can be canceled out. If I 93 G I Z ( T 4i) ,  then 
I 93 =  T46 for some b .  Therefore, if I 93 G T Z ( T 4 ) ,  then
rank[(Z93 -  P i )T4] =  rank (T4 -  I 93^ 1T93T4)
=  rank (T4 -  T4 6% 1I 93T4)
=  rank[T4 (Z94 -  ^ 1I 93T4)]
=  rank (Z94 — 1I 93T4) because T4 has full column—rank
=  t r (Z94) -  tr ( I93Q2T1I 93) because 6% 1I 93T4 is idempotent 
=  rank(T4) -  I.
Factor (Z93 -  P 1) T4 as
(Z93- P 1)T4 =  u pv ;,
where Up and Vp each have full column-rank. Then
(Z93 -  P1) ^  =  UpC, where
43
Accordingly, T4 should be replaced by Up, a matrix whose columns form a basis set 
for Tl [(Ig3 — I 93(^1Ig3) T4]. For convenience, the remainder of the thesis still uses 
symbols T4 and £ instead of Up and £*. For the example in (2.5), T4 =  T 2 should be 
replaced by
T4 =  (  )  ,
or any other full column-rank matrix that has column space equal to % ^ ^ y 1 I . 
Consider the second constraint. It follows from (2.8) that
C X
= C0 <=> C2exp{©(T4£)} =  0, (2.10) ■
To
where C2 is a g3 x r  matrix with rank r whose columns are a basis set for Tl [T3'(C 4 — I p C70)] . 
To show the equivalence in equation (2.10), substitute A from (2.9) to obtain
ClT3exp{0(T4£)}
l^ e x p { 0 (H £ )}  ^
4=^ C ^  exp{0(Tt£)} =  c o i ;#  exp{0(Tt£)} 
4=^ (C (-C o i;)^ e x p { 0 (3 l£ )}  =  O
-<==> C2 exp{©(T4£)} =  0.
Let V  =  (Vi V2) be a Q4 x g4 nonsingular matrix of constants, where V4 has 
dimension $4 x r, V2 has dimension g4 x z/2, p2 = Qi -  r, and V^ 7V2 =  0. Then, T4£ 
can be written as
T4£ =  TaV V - 1Z
= T4 [V1(F17F1) - 1V1^  +  V2(V27V2) - 1V27^ ]
44
— T^V1T + V2<p), (2.11)
where r  =  (VlzVl)- 1Vl'^, y> =  (VzV )- 1V ^ j and r  is an implicit function of y . 
Substituting T4^  from (2.11) in the constraint (2.10) yields
O2 exp{©(T4(VT +  V2(p))} = 0.
Taking the derivative of the above constraint with respect to r '  yields
^{C% exp{0 (2X V T  +  V<p))}} =  C ^diag(exp{0(I^)})3 lV
=. C2 W 1 V1, (2.12)
where
Wi =  diag(exp{0(ll()})Tl. (2.13)
By the implicit function theorem (Taylor and Mann, 1983), if the matrix of derivatives 
in (2.12) is nonsingular, then r  is an implicit function of (p implying that J^7 exists. 
Theorem 2 chooses a matrix V  such that C2W 1V1 in (2.12) is nonsingular. In the 
theorem, it is assumed that C2W 1 has full row-rank when evaluated in an open 
neighborhood of £* and £* satisfies C2 exp{©(T4£*)} =  0.
Theorem  2. Assuming that C^Wi has full row-rank when evaluated in an open 
neighborhood of and £* satisfies C2 exp{©(T4£*)} =  0. Write the singular value 
decomposition of C2W 1 as follows:
C iW 1 =  Ue(Di  0) (  ^ !  )  =  UeDeV1', (2.14)
45
where is a n r x r  diagonal matrix of singular values, G 0{r), V  =  (V1 1^) E
and W i is defined in (2.13). This choice of V  ensures that the derivative 
matrix in (2.12) is nonsingular.
Proof. Assuming that CllW i has full row-rank. It follows from (2.I f )  that
C 12W iV i = U$D^V{Vi =
(O2W iV i)-1 -
Therefore, C 12W iV i is nonsingular for this choice o f V .  □
By Theorem 2, r  is an implicit function of ip implying that exists. To find 
; differentiate both sides of the constraint in (2.10) with respect to tp' to obtain
exp{©T4[y1T +  V27U =  0 
=S-CJTV1(V1^ H - V 2) =  0
f)/T"
=  0, (2.15)
because C2W i =  U^D^V) and V{V2 =  0.
For fixed V  from Theorem 2, the first three derivatives of A in (2.9) with respect 
to y / are given in Theorem 3. These result are useful for estimating parameters and 
for constructing likelihood ratio tests including Bartlett correction as discussed in
Chapters 3-4.
46
Theorem  3 (A dapted  from  Boik 2002b). For fixed V  from Theorem 2, the first 
three derivatives of X in (2.9) with respect to (p' can be written as follows:
D f
D f
D f
g—; =  VjiPoH xT sW 1V2,
d y C L '  =  8 ,1 ,2 3 ^1% ) JV ,,+  tu%,oJHrA3T3JBr<t1/2(T/2
d3X
ci*,' G O f ' e  Ckf' =  8) %) 0  t t ]  j v ;
-2w { (d ^  O TpT3W 1V2) + [TpT3H ^W fiV2 ® V2) ® } K
+wpoHxT3H ^W 3(V2 ® V2 ® V2), where
w =  (IpT3 exp{©(T4^)}) 1, H x = I p -  Xp0 1I^,
H ( = I n - W 1V1D pU ^C 2, iV„ =  i ( 4 | + ! ( „ « ,) ,
Nfi — -I(y2,y2) +  (Jy2 ® JVy2),
93
W2 =  ^ ] e r e x p { e ^ } ( e r ^ ® e r 2 4 ) ,
i=i
93
W3 =  e rex p {e r '2 4 (} (e r2 4  0  6^24  0 6 ^ 2 4 ) ,
i = l
e f  is the ith column of I qz, and J(.v) is the commutation matrix.
Proof. The first derivative of X with respect to ip1 can be written as
D f  =  {W T3 e x p iepU )}  [IjjT3 exp{0(:ZU)}]
=  W w - ( v ^  +  v )  -  V 1+  W1 (V 1^  +  V2) ) ,
because 8 exp^ ),T4^ -^  =  W 1 ^ 7, where W 1 is defined in (2.13).
47
Therefore,
D ( i )A Po •) wTs W i V2 — w— ILI3 Wi V2 I Tq
Tq w ( ip  —— Ip l I q W l V2 
V To J
To w Wi V2,
because =  O by (2.15). The second derivative of X with respect to (p' can be written 
as
D<?  =  ^  H a T3 W1 ( V v ^ V 2) ]
= (Po 8  Wi V2) 4- 8 3 3  Wi %)
+V H > T ^ ( I , ® V 2) + w H^T3W 1V1 }  •
Applying standard procedures for differentiation yields the following results:
d w
d ip '
dH x
d ip '
dW i
d ip '
d 2 r
d ip '  <8> d ip '
-W 2VpTz Wi F2,
K
W2 (F2 ® Iy2), and
(2.16)
(2.17)
(2.18)
(2.19)
(2.20)
To verify equations (2.19) and (2.20), take the second derivative of (2.15) to obtain
d 2T
d ip '  <8> d ip '
dW-i
- ( C ^ W iV l ) - :C ^ ( f ^  gv%)
d ip '
(2.21)
48
because V{V\ =  I r. To compute ^ r ,  express Wi in term of summation as follows:
93
Wi =  e f  exp{e?''24f
i = l
Applying the product rule yields
r)W  93
- g ~  = X M e fT iW e f T t ® ® I n )
^  i = l
=  WsfVS <8 JT^). (2.22)
Substituting (2.22) in (2.21) yields
d 2r
dip' ® dip1 8  f ^ ) ( ^  8VS)
=  - D ^ U '^ W z iV z  8  V2).
Substituting the results (2.17)-(2.20) in (2.16) yields
=  -w (l) ( i)8 i;2 1 ,W iV s)  J T ^ )  Wi t t )
+ PowHx TgWsfVS 8  V2) - P o w H x T3 W 1V1 D f 1 C'2W 2{V2 8  V2) 
= -2w  ( p f ] 8 IpTs Wi VS) N v2 + PowHx T3 H i W2(Vr2 8  Vr2).
To compute the third derivative of \  with respect to <p', the following additional results 
are needed:
=  WsfV2 ® I V2 ® I V2) and (2.23)
^  =  -  H i W 2 (F2 8  Vi D t1 E7' C ') . (2.24)
To verify equation (2.24), note that it follows from (2.14) that
(C% Wi F i)-1 =  D ^1 U 1i .
49
Then,
OH1
dip' dip
^  (fy, @)Vi q  C%) -  Wi 1/1 ^  ®c%)gy ' V-": -  - 1 - t  - 1 - z ,  - - 1 1 gy,,
=  - W 2 (F2 ® Fi D g1 ETJ C2) + W 1F1 D ^1 ETj C 'W 2 (F2 ® F1 ETj C2) 
= - ( I qs- W 1 V1 D ^ 1 C2) W 2 (V2 ^ V 1 D ^ 1 C2)
= - H i W 2 (V2 ® V1 D ^ 1 ETJ C2) ,
by using the property
dO' dO'
where X  is a nonsingular matrix. The third derivative of X with respect to ip' can be
written as
B (x3) = 2w dr
dip'
+ PqW H x T3 H i W 2
TFi I F 1
+ v *
+  F2
®
* & + v \
. (2.25)
For convenience, the two parts on the right-hand-side of (2.25) are labeled as D x \a )  
and D x ^(b). Simplifying D x ^(a) yields
D f 1(O) =  - 2I ^ T  ( i „ ® D l11S ^ T 3 W1^ 2)
+ ^ ( d ® ® i;T3m v 2)
+m ( 1+ 3^ r (I-'= ®y2) ® d ^ )
+w (VpT3W 1V11+ ^  8, D f )  } (/«® JV „). (2.26)
50
Applying the results (2.17)-(2.20) and (2.23)-(2.24) gives
( i ; ^  ^  W2(V2 8, t%) <8 1(7^ 8  N^). (2.27)
Simplifying (b) yields
+ w ^ : ( 7 ^  8T^ 7f( WsflS 8  is ))
O TJT OTTT-
+w7fA^-g-^(7^8W 2(lS<g)lS)) +  w 7rx23ar(-g -f(Z ^® lS® lS )  
+ , .H xT3H 5 W2 (F 15^ 7 8 V2)
+w H x T3 H ( W2 ( / te ,w) (V i9^ S y -  ® V2)  (In  ® I m ) )  }
=  — PqW2 (lplg Wi IS 8  HX Tq Hg W2 (IS ® IS))
-w  (Z)(I) 8 i ; z s  ZT( W2( is  ® IS))
- P qW  Hx T3 Hg W2 (V2 8  Fi D^1 C7' C'2W 2(V2 8  V2))
-\- PqW H x T3 Hg WgflS ® V2 8  IS)
+  P qW  H x T3 Hg W2 (Vi 1 Ug Cf^ W2 (F  ® V2) 8  V2)
t
+  Po  W W  H x T3 Hg W2 (Vi D f 1 U'g CSW2(IS 8  F2) 8  F2)
X (Z122 8  T(Zy2jlZ2)). (2.28)
51
Additional simplifications yield
(z>(D H f (g,
+ PoW Hx H f Ws (^2 ® Vz ® Vf)
- 2  w H^ T3 H fW s ^  ® ® (2.29)
where N*2 =  +  {IV2 ® N v2). Combining (2.27) and (2.29) yields
D (3)A -2wy,o#V%,H(W2 ® ® V^ ]
~ 2 w { ( D
( i ) ® IpT3WiT^j + I^T3H fW 2(y2 (8) V2) ® D ^ ]  } N I
+wipoHxTsH^Wsiy?. ®V2® V2).
Corollary I. I f  C2 has rank = 0, that is there are no additional constraints, then 
r  =  0, V2 = I gi, V2 = cu, y? =  £ and the derivatives simplify to
n P = WpqH xTsW 1,
D m 
D
A
(3)
-2w  ® I^T3W i) N v2 +  WpqH xTsW 2,
'x -2w  { ® 1JT, W i) +  [ l j ^ w 2 ® I N,
+wpqH xTsW s-
Since Pq defined in (2.7) is a parameter, the first three derivatives of A in (2.9) with 
respect to pQ are useful for estimating parameters and for constructing the tests. The 
results are shown in Theorem 4.
52
Theorem  4. TAe first derivative of X in (2.9) with respect to cpo can be written as
(i) =  dA_ =  T3exp{0(T4g)} =  A_
l ^ e x p { 0 M }  Po
and the higher order derivatives are zero.
Parameterization of Eigenvectors
Boik (1998) described a local parameterization of orthogonal and semi-orthogonal 
matrices. Boik (2002a) applied this parameterization to the eigenvectors of the co- 
variance matrix. Let P b e a p x p  matrix whose columns consist of the eigenvectors of 
S  and let Po be an orthogonal matrix in a small neighborhood of P. That is P ry P0.
Boik’s parameterization is based on the trivial equality P =  Jp P. Write the identity
1 ' I
matrix as an orthogonal matrix, PoPo- Let G(yu*) =  P70P, where G(ix*) G 0(p)  
and pb* is a vector of parameters. Then, Gijj,*) pz Ip and the parameterization of 
eigenvectors can be written as
P =  P0GOu*). (2.30)
It is assumed, for now, that P0 is a known matrix. This assumption is not made in 
Chapter 3 when estimation is discussed. By orthogonality, G is subject to p(p +  l ) /2 
constraints, GG7 =  I p. If each eigenvalue has multiplicity I, then G  is a function of 
p(p — l) /2 parameters denoted by jit*. The p(p — I)/2 elements of /u* correspond to 
the p(p — I) /2 elements in the upper triangle of G Let r f  be an implicit function 
of jit*, then 77* has p(p +  l ) /2 elements. Let m  =  (mi m2 • • • mU2)' be the vector
53
of eigenvalue multiplicities and denote the distinct eigenvalues by (<pi, ^2, • • ■, V^ 2) 
where z/g =  dim(y?), z/2 < p, and Y^iLi m i =  P- The columns of G  can be partitioned 
with respect to the multiplicity vector m  as follows:
G = Gi G 2
pXmi pxrri2 PXmv2
The diagonal form of S  can be written as
Z/2
S = FAr1 = T0GAGT10 = F0Gi^JmiGjr70
where Qi is an orthogonal matrix, Qi G O(rTrii). Partition G i as (G1il G1i2 G1i3)', 
where Gii is Y]=i mj x mh Ga  is mi x and Gi3 is YljLi+i mj x mi- Using 
QR factorization, an orthogonal matrix Qi can be found such that G i2Qi is a lower
triangular matrix. Then, if 77% > 2, the columns of Gi can be rotated to annihilate
the mi(mi — l ) /2 elements above the main diagonal of Gi2. The dimension of p* is
reduced to (p2 — ra 'm )/2 . For example, if p =  6 and m  =  (I 3 2)', then G can
be written as follows:
(  V i /4 Pt Pt //g \
Vl Vt 0 0 Pt /4
Vt Vt 7/12 0 Pt /4o
Vl Vt 1^3 %16 Pt P l l
Vt # 0 Vh V l 7 ?7l9 0
\  776 vh vh ^8 7721 /
Gi G2
6X1 6X3
G3 ).
6X 2 .
If 11/it* 1 is small, then G Ip and given /T, GG7 =  Ip can be solved for rf .  Solving
54
for rj* is discussed in the last section of this chapter. In general, G  can be written as
vec G = Aifx* + A2Tj*, (2.31)
where A i and A2 are known indicator matrices with dimensions p2 x ((p2 — m 'm ) / 2 ) 
and p2 x (p(p+ l)/2), respectively. Because Az1A1 =  I{p2-m'm)/2, A12A 2 = Ip(p+i)/2, 
and A 1A2 =  0, vec G satisfies A 1 vec G — jj,* and Az2 vec G = rj*.
Suppose that interest is in testing the hypothesis in (1.21),
H0 : A G n {VM ),
where A is a known p x g semi-orthogonal matrix with rank q and M  is a p x m 
semi-orthogonal matrix such that H (M )  The restriction A G T^-(TM)
further reduces the number of entries in n*. Theorem 5 describes this reduction. 
First, however, note that the parameterization in (2.30) is valid for any matrix F0 in 
a neighborhood of F. It is most convenient to equate F0 to F. In this way, the true 
value of n* becomes zero. That is,
F =  FG (aO )
Mlft=O
because G(O) =  I v. In the remainder of this thesis, F0 will be taken to be F.
Theorem  5. Denote C  =  (M cl ® AT ), where M cM cl = I p - M M 1 and F =
r c M U =O - Then>
A  G Te(FM) GvecG  =  0. .
55
Proof. It will be first shown that A  G IZ(TM) 4=> A 1T M c =  0. The proof consists 
of the following two steps-:
Step I: proof that A  G TZ(TM) = >  A 'T M C =  0.
Consider A  = (T M M 1T 1 +  FM cM cT ')  A .
A e t z (TM ) =>  f m m T 'a  =  a  
= >  T M cM clT1 A  — 0 
= >  A TM c =  O.
Step 2: proof that A 1T M c =  0 ==> A  G TZ(TM).
A TM c =  O =^> A =  FM M T 'A  
= >  A G %(FM).
Applying the property of the vec,
vecpCT Z) =  (Z' ® %) vec T , (2.32)
where X,  Y ,  and Z  are any matrices that are conformable for multiplication, yields
A eT Z (TM ) <=> A T M c =  O
4=> vec(ATM c) =  0 
<*=*► vec(ATGM c) =  0 
4=^ C  vec G =  0.
□
56
Accordingly, G(fj,*) is further constrained. It was shown in Theorem 5 that 
A  6 TZ(FM) <==> CvecG  = 0. These additional constraints reduce the number of 
parameters by (p — m)q and increase the number of implicit parameters by (p — m)q.
Let V*  be a (p2 — m 'm )/2  x (p2 — m 'm )/2  nonsingular matrix and V* =  
(V3 V4), where Vz has dimension (p2 — m 'm )/2 x Iv1, V1 = (p2 — m 'm ) /2 — (p — m)q,
V4 has dimension (p2 -  m 'm ) / 2  x (p -  m)q, and VlVz = 0.
Then, vec G  can be rewritten as
vec G = A 1 ju* +  A 2rj*
= A 1F3(F37F3) - 1F3V* +  A 1F4(F47F4) - 1F4V* +  A2T7*
=  A1F m T A 1F tT7l +  A 2Iq2I (2.33)
where T7 l =  (V^F1) - 1F tV*, %  =  ?7*, and /u =  (F37F3) - 1F3ZLt*. Equivalently,
vec G =  A 1VzfJ, +  [A1F  A2Jt7, where T7 =  ^ ^ 1 ^ (2.34)
and T7 is an implicit function of fi. Taking the derivative of the constraints with 
respect to T77 yields
/  n /  \  P o  T i /  A t z  n  r v  a  I
(2.35)
where D p is the duplication matrix (Magnus and Neudecker, 1999), Dp satisfies 
vec H  =  DpVechDr, D  is a symmetric matrix. By the implicit function theorem 
(Taylor and Mann, 1983), if the matrix of derivatives in (2.35) is nonsingular, then 
T7 is an implicit function of fj implying that 0 -  exists. To verify that (2.35) is
_d_ (  D 1vVecGG1 \ ' SDpA1F SD^A2 '
drf' \  C vecG  J At=O C A 1F4 CA 2
57
nonsingular, the following results are useful:
D 1pA 2 — Ipip+i)/2 and (2.36)
A 2DpAi — I(p,p) A l. (2.37)
The derivative matrix in (2.35) is nonsingular if and only if its rank =  ~^^~+(p—m)q. 
The rank of a matrix is unchanged by interchange of columns, therefore
Because the matrix D 1pA 2 is nonsingular with rank by (2.36), the derivative
matrix in (2.35) is nonsingular if and only if rank (CAiV^  — C A 2DpA 1Vi) = (p — 
m)q (Ouellette, 1981). That is,
is nonsingular. The matrix V* should be chosen to ensure that matrix (2.38) is 
nonsingular. A suitable V* and expressions for the first three derivatives of vec G  
with respect to fi' for a special choice of V* are given in Theorem 6 and Theorem 7.
Theorem  6. Define W4 as W4 =  C (Ip2 — I v^)) A 1. Express W4 in term of its 
singular value decomposition as
W4 =  U(D  0) (  ^  =  UDVl> . (2-39)
where C  is defined in Theorem 5, D  is an (p — m)q x (p — m)q diagonal matrix of 
singular values, and V* =  (V3 V4) 6 0((p 2 — m 'm )/2 ). This choice ofV* ensures
that the derivative in (2.35) is nonsingular.
C{Ip2 I  ip ^pf) A 1Vi (2.38)
58
Proof.' It is necessary to show that W4 — C( Ip2 — I Ai  has full row-rank. The 
quantity W4. has full row-rank if and only if
t'C (Ip2 — I(P^f)A1 — O 2 — 0. (2.40)
Itfollowsfrom (1.21) that
M c °  ) R  J ’
where R  is any (p — m) x (p — m) orthogonal matrix. 
Therefore,
C  =  [M c' ® AT) (0(p-mxm) A F (2.41)
Let F =  (Fi F2) which have dimensions p x m andp x (p — m), respectively. It also 
follows from (1 .2 1 ) that
A  G Ti(FM) = >  A E Ti(Fi) A =  F1T  for some T
A! = T'T'i
AT  =  [T' 0),
because Fi and F2 are orthogonal. Substituting in (2.f.l) yields
C  =  [M cl <g> AT) =  [0 Ji'] 8  [T' . 0] . (2.42)
Expressing A i as ® eDeY > where j  > i and w = p2 the left-
hand-side of (2 .4 0 ) can be written as
p p
t'
*=1 3=i
f[0 0 ] j ( ^ - f ^ ) ) ( ^ 8 e ? ) e y  =  0.
59
The following properties of the vec are applied:
[X ' ® Y)a,i = vec(Y" dvec(Oi)X ) and
dvec(vec(e?e%'),P,p) =
(2.43)
(2.44)
where X  and Y  are any matrices. 
Accordingly,
] r W i ' v e c ( [ r  0 ](eH ')  ( “
1=1 j = i   ^  ^ \
- t ' v e c ( [ r  0 ](e '< ) (  I
R  I I 6™'
eT
Therefore,
£ £ t ' v e c ( [ T '  0 ] ( e ? e f ) ( M )  =  ^ vec ([T' OKefe?')
1=1 j = i   ^ \  /  /  i=1 j - i  \
Define T* so that t  =  vec T*', then
tr ( r * [ r  0](e?e?') (  ^  ) )  =  tr ( t *[T' 0](e?e?') (  ^
ef' (  ” )  T ’ [ r  0]e*\nr' — JP'\e-r I R  ) T*[T' 0 ]^ ,
which implies that 
Finally,
T* [T' 0] is symmetric.
R T*[T' 0]
0 0 
R T * T  0 and
RT*T '
t  = 0 ,
60
because R  is orthogonal and T  has full column-rank. Hence, Wi has full row-rank 
and for fixed V* in (2.39), W 4V4 = UD  is nonsingular. □
By Theorem 6,77 is an implicit function of jj, implying that J^r exists. Expressions 
for the first derivative of 77 with respect to / /  can be obtained as follows:
dffh
dn'
d v 2
djj,'
0,
- D 1pA 1V3.
(2.45)
(2.46)
To verify the above equations, take the derivative of the vec of the first constraint, 
GG' = Ip, and use property (2.32) to obtain
d vec GG 1 
<9/V 
d vec GG 1
— ( I p 2 + I ( p , p ) ) ( I p  ® G ' )
SvecG
dfi'
— 0 and
dfi' 2 #(j,=0
SvecG
S/Li' 0,
where
N p —  ^( I p 2 + I(p,p)) -
Equivalently,
m  S vec G
 ^ S/i' PL=O
0,
(2.47)
(2.48)
because N p =  Dp(DpDp) 1D lp. 
Simplification yields
D 1pA 1Vt + D 1pA 1V1^ +  -Dp A2Qrj2dfi1
0
61
dfl2
dn' [D1pA 1Vs +  D 1pA 1V^ J
because of the result in (2.36).
Taking the derivative of the second constraint, C  vec G =  O, yields
&ni
,<9 vec G 
dii'
dV2C A 1V3+ C A l Vi ^ +  C A 2^
0
0.
Replacing (2.49) in (2.50) and using the result in (2.37) gives
C (Ip2 — I(PjP)) A 1V 1
W tV t
dm
dn'
dm
dii'
dm
dfi'
—C (Ip2 — !(p^)) A 1Vs 
-W tV 3
0,
(2.49)
(2.50)
because W t  =  UDV l  and VlV 3 — 0. Then, it follows from (2.49) that
dm
dfi'
-D fpA 1V3.
Expressions for the first, second, and third derivatives of vec G in (2.33) with re­
spect to /x' are given in Theorem 7. These results are useful for estimating parameters 
and for constructing likelihood ratio tests including Bartlett corrections as explained
in Chapters 3-4.
62
Theorem  7 (A dapted  from  Boik 2002a). The first three derivatives of vec G in 
(2.33) with respect to n ', evaluated at pi = Q, can be written as follows:
=
<9 vec G
o f
dpi'
<92 vec G
=  (JP2 -  J(p,p)) AiVr3,
At=O
At=O
93 vec G
• =  (1P2 -  P )A 2D'p{Ip <g> vec Jp ® Jp)' ® 
- ( P -  Jp2)A2Dp(Jp ® vec Jp ® Jp)'
At=Od/LA' ® d p i ' ® d p i1
D f  ® /(„,„) Dg>) -  (r>g> ® D f )  {/„, + (/„, ® 
K 1(CK 1) - 1C 1 K 1 =  (Ap= -  I lplliIA1Vll
X
where P  ■
C K i  =  W4V4 = UD , and C  is defined in Theorem  5. (2.51)
Proof. Denote the ith derivative of rjj by r j^ . It follows from (2.33) that
D $  = A 1Vz + A 1Vm P +  A 2TiP
= A 1V3 -  A 2D 1pA 1V3 by (2.45) and (2.46)
— (Ip2 — I(p,P^A 1V3 by (2.37).
The second derivative of vec G can be written as
Bg) =  A i t q ^ +  A2, ,^ (2.52)
To compute r]P and r iP , first examine the second derivative of vec G'G:
92 vec G G  
d p i ' ® d p i '
d
d p t '
2Np
2JVp(]p ® G')
9 vec G 
d p i '
= O
d(Ip ® G')
d p i'
J^  ®
9 vec G 
d p i '
+ (Ip ® G')
92 vec G 
d p , '  ® d p , ' = 0.
63
Using the properties,
{X  <g> Y ) Z  = J(rjp) (Jr (g) (vec I s)' O X ) (vecF '  ® I ^ Z ) ,
where X  is p x q and Y  is r x s, and N p = N p I(p,p), yields 
d2 vec G'G
At=Odfx' ® dpi'
N t  { (It  ® (vec J ,) ' ® 4 )  (n g )  ® e g ) ) +  e g ) } =  0
=  e ;  { -  (Jp ® (vec J ,) ' ® J,) ( e g )  ® e g ) )  +  e g )}  =  o
=  e ; { - j f  +  e g ) } = o ,
where K  ■ Ht  0  vec It  ® Jp)' ( e g )  0  e g ) )  and Jfppi e g )  =  — e g ) . 
Taking the second derivative of the second constraint, C  vec G =  O, yieMs
^  S2YecG
dp,' ® dpi,' /t=0 
) (2) _C  D q =  0.
Substituting from (2.52) in (2.53) yields
then
{ - K  + A lV in f  + A2Tt f ]  =  o,
I j f  =  D 1pK  -  D 'A 1V tn f ,
by using result (2.37). Substituting from (2.52) in (2.54) yields
C  [A 1V ^  + A 2T1^ ]  = 0 .
Substituting rffi from (2.55) yields
C (Ip2 - I m )A 1V ^ +  C A 2D 1pK  = O 
W4F4T?? +  C A 2D 1pK  =  0.
(2.53)
(2.54)
(2.55)
64
Replacing W4 =  UDV l yields
=  - [ U D y 1CA 2D ' K .
Then, it follows from (2.55) that
T1W =  D 'K  +  ^ A 1F4(T O )-1CA2B lii:.
Consequently,
1(2) A2B lK  -  P A 2D 1K
[Ip2 -  P ) A 2D y ip ® vec ip (g) ip)' (b ^  (gi B ^ j
Similarly, D q can be obtained. It follows from (2. S3) that
B§) = A 1^ ) + A 2^ . (2.56)
To compute Tf1 ’ and ry2 , the third derivative o/vec G'G with respect to fj! is obtained
as follows: 
d3 vec G'G
dfi' <g> OfJ,' ® dpi' (i=0
— 1 (^p ® (vec ip) ip) (ip2 <E> I(p,p))"
( s g ’ e D g i )
dy.' (i=0
a (4 ® c ')  Bg)
d/it' A^=O
0. (2.57)
For convenience, the two parts on the right-hand-side of (2.57) are labeled as Ga and 
Gb- Using the Kronecker product rule, Ga can be simplified to
Ga = - N p  (ip <g> (vecIp)' ® Ip) (Ip2 ® i(p,p)) 
+ ^ 2 ^ 2 ) (b ^ ®  B y )  (Jy1 ® I(UiyHi))
(D g 1O-Dg')
65
Express (vec I p ) 1 as Y^ h=I (ef  ® eD an^ repeatedly use the permutation matrix:.
{A®  B ) = I(pjr)(B  <g> A) I (Ctq), where A  is r x  c and B  is p x q, 
to simplify Ga as follows:
Ga =  - N p (Ip ® (vec Ipy  ® Ip) ® [lu3 + (I 1^ ® .
Similarly, G^ can be simplified to
=  Np { (Zp 8  ( v e c ®  ® +  a g )  I .
Combining G a and Gb yields
=  Np i  (fp ® (vec ip)' <8 ip) [ -  ( i)g ) 8  D g))
[f,, + (I,,, ® f^,^))] + (Dg) <8 i(p,p) Dg)) ] + Dg) j
= D 1p [K*  + D i^  = 0, (2.58)
where
K *  =  ( ip®veclp® lp)'(  (d ^1® J(P1P)
-  ( D f ® D f )  [Jp, +  (I,, « /(„ „ „ ))]) .
It follows from (2.54) that
C D g ) =  0. (2.59)
. Replacing Dg) from (2.56) in (2.58) yields
D'p [ k * +  A 1W i 3) +  A2T7? )} =  0.
cr vec G1G  
dpi1 <8 dpi' <8 dpi'
66
Therefore,
Substituting from (2.56) in (2.59) yields
Substituting rffi from (2.60) yields
= 0.
C(Jp, -  J(Pf))Ai-%%(3) -  CAa^JRT*
W4V4T7S3) -  C A 2D 1pK*
0
0.
Replacing W4 =  UDV f yields
T7J3) (U D )-1C A 2D 1pK*.
Itfollowsfrom (2.60) that
i f  = - D 1pK*  -  D 1pA 1Vi (U D )-1C A 2D 1pK*
Finally, replacing r j f  and r j f  into (2.56) yields
(P  -  IP2)A 2D 1pK*
(P  -  Ip2)A 2Dp (Ip ® vec I p ® Ip)' I  (d Q  ® I(p,p) D f^ j  
-  (I)C?) ® D §)) [I,? 4- (J^  ® I(,ifi))]
(2.60)
□
67
Corollary 2. The first three derivatives of vecG in (2.33) with respect to fj,', 
evaluated at fj, = 0, under constraint A  e  'R (TM ), where R,(M) — % ^ ^  ^ can 
be written as follows:
,(i) <9 vec G  
djj,' = (IP2 -  /(p,p)) A 1Vr3,At=O
(2) _  <92 vec G
G dfj,' <S> OpL1 At=O
= A2-Dp (Ip <g> vec Ip <g> Ip)' ^
d3 vec G
d/Lt' ® 9//' <g) d/Lt' At=O
A2D ^ ip  0  vec Ip 0  Ip)'
-  (d P  ® £>g>) { +  (J „  ® J ^ ) }  .
(-Dg1 ® /(„,„) Dg21)
Proof. This is a special case of Theorem 7 when Tl(M ) =  % ^ Q j  ' ^  s m^^ ar 
to the proof of Theorem 7 when C A 2 =  0 proven in the next theorem. Then, the first 
three derivatives can be reduced forms because P A 2 is equal to zero. The second and 
third derivatives of rj with respect to fj,' can be written as
ri?  = 0, T7P  =  D 1pK ,
I7P  =  0, and %  ' =  - D 1pIC
□
Under constraint A  G Tl(TM ), where Tl(M ) =  77 ^ , the implicit pa­
rameter rj is restricted that Tq1 =  0. This condition reduces form of the first three 
derivatives of vec G  in (2.33) with respect to n ' as shown in Corollary 2. The proof 
is shown in the following Theorem.
68
Theorem  8. For fixed V* from Theorem 6  and under constraint A  G Tl(TM) yield­
ing C  vec G  =  0, where Tl(M ) =  % ^ ^  ^ , then
Vi = o,
where % has dimension (p — m)q x I.
Proof. Note that
vec G = AiVgjLt +  A 1V4lIg1 + A 21H2,
where jit =  Vfn*, % =  Vfn*, and H2 =  H*-
It follows from (2.39) that
C (IpZ -I(P ^))A 1 — U D V f
= ^ C A 1 — U D V lpC I(P tf)A 1. (2.61)
Since,
C vecG  =  C A 1V* + C A 2H* = 0, 
substituting (2.61) in yields
UDVln* +  C I ( P t f )  A ltJ-* +  C A 2H* =  0.
Therefore,
UD h1 C ^ A ^  +  A g^) 
CAg M A i ^ +  f7*). (2.62)
69
It will now be shown that C A 2 —  (M cl <g> AT ) A2 —  0 .  It follows from (2 .4-2) that
(M c' O AT ) A2 =  ^[0 R!] O [T' 0]^ A2,
where R  is any (p—m) x (p—m) orthogonal matrix. Expressing A2 as YJj=i YJi=j(eVj®  
e ^ )e f, where i>  j ,  w = pip +  l)/2 , and I = U- 1Wp-J) j yieids
(M cl ® AT ) A2 =  ( t0 izI ® [T' 0 ]\ (Bpj ^ e pi )B f.
j = l  i = j  ^
Accordingly, using (2.43) and (2.44) yields
(JW0 O A T )A 2 =  ^ f ^ v e c  ( [T ' 0](e?e?') (  °  )  ) e »
j=l i = j   ^ V /  /
where i > j .
Choosing j  < i < m  yields
i \ R = 0.
Choosing i > j  > m yields
[T' 0]ep = 0.
Consequently,
(M cl <g> AT) A2 =  C A 2 = 0.
By replacing in (2.62), the result is
UDrj1 = 0 
Vi = 0.
□
70
Solving for Implicit Parameters
Denote by £(0) the loglikelihood function of 0 given an observed Y . The MLE 
of 6 , 0, is the vector which maximizes £(0). To compute this maximization, the 
Newton-Raphson, algorithm is employed. Let 0O be an initial guess and Oi be the 
estimate of 0 after the ith iteration. Expand £(0) in a Taylor series around 0 =  Bi, 
so that,
#  =  ^ )  +  ( e - e , y
+ 2 ( e - ^ y
'^ ( 0 )
80 O=Oi-
' 0^(9)
O=Oi-
(2.63)
where
Qei —
cW(9)
O=Oi
and Hg. = 80 ® 89' O=Oi
Set the derivative of £(0) to zero and solve for 0. The result is
S = Si - H ^ g gt. ■ (2.64)
Solving for r
In reduction in the number of components, T4£ can be written as (2.11), where r  
is an implicit function of (p. For given (p, to solve for r  in (2.11) subject to constraint
(2.10), C 2 exp{©(T4^)} =  0, the equation (2.64) is employed as follows:
T = Ti -  H ^ g f i , (2 .65)
71
Qti = Cgexp(G)(J^M)) and
Hf, =  C^diag(exp{0(l^M )})j;^M
where
Solving for rj
In reduction in the number of variables, vec G  can be written as (2.34), where r/ is 
an implicit function of /Li. For given /Li, to solve for rj in (2.34) subject to constraints, 
GG' = Ip and C  vec G =  O, the equation (2.64) is employed as follows:
where rj = %
%2 )
9fii
V = V i -
D) vec(GM GM' -  Ip) 
Cvec CM , and
„  _  /  2D;(G«)® JpXA1V4 A2] x
“  I  CIA1F4 A2] b
In the special case described in Theorem 8, vec G can be reduced to
vec G =  Ai /it +  A2TZ2- (2.66)
To solve for Ty2, and can be simplified to
gt- = D 1pVec(GMGW -Ip) and
=  2D;(GM<g,Tp)A2.
72
CHAPTER 3
ESTIMATING PARAMETERS AND CONSTRUCTING ASYMPTOTIC
DISTRIBUTIONS
In this chapter, the likelihood function will be shown to depend on the unknown 
parameter S  and this parameter will then be estimated. The method of maximum 
likelihood is the most popular technique for deriving estimators. It has been very suc­
cessful in finding suitable estimators of parameters in many problems. A maximum 
likelihood estimator, MLE, can be obtained for any multivariate density function 
whose likelihood function can be written down and maximized with respect to the 
unknown parameter. This chapter deals with finding the MLE of S  when the like­
lihood function is tractable but there is no closed form solution. Then, maximum 
likelihood estimates are obtained numerically.
Suppose interest is in estimating S  under the model in (2.1). To find the maxi­
mum likelihood estimator of X, say X, one common method is to use the invariance 
property of the maximum likelihood estimator in statistical estimation. The invari­
ance property is described in Cox and Hinkley (1974). Briefly, the invariance property 
of MLE says that if O is the maximum likelihood estimator of 0, then r(0 ) is the 
maximum likelihood estimator of r(0), where r(0) is a function of 0. Using this 
property, the MLE of X is given by
X =  PA f',
73
where f  and A are the MLEs of T and A, respectively. Specific goals of this chapter 
are to develop procedures for estimating F, A, and S  and to construct asymptotic 
distributions of f  and A.
Loglikelihood Function
Assume that the sample covariance matrix, S , follows a Wishart distribution with 
covariance S  and n degrees of freedom. The distribution of S  is denoted by
nS  ~  Wp(n, E).
The loglikelihood function can be written as
£(9) — 1(0; S) =  — ^  tr (SE  1) — ^  In |S | +  constants, (3.1)
where S  =  FAF', F =  FG, and 6  = (jj,1 (p')' with dimension (U1 U2) ' .  By using
the following results:
I-1
Oln |2 |
tr(AB) =  (vec A 'j'vecB ,
the first derivative of £(0 ) with respect to Oj can be written as
S? - H - S - B - K f -
=  I  {vec(S -  S)}' ( S - 1 ® S - 1)
74
Accordingly, the first derivative of £(0) with respect to 0' can be written as
do' I %  %  "  j
^  {vec(^ -  S)} ' ( 3 - i  <8 3 - i )
dvec S
80' ‘
Finally, the first derivative of £(0) with respect to 0 evaluated at p, =  0 can be
simplified to
80 | f (1)' (3-1  ® 3-1) vec(S -  3 ),
where
F (1) dvec 3 f dvec3 dvec S
80' /x=0 I /x=0 dip'
' 111 compute F1W, S  is expressed as
(3.2)
(3.3)
3  =  FGAGT '.
Using property (2.32) yields
vec S  =  (FG ® FG) vec A.
Therefore,
dvec S
dip' /x=0
( r s r j i D ™ ,
where Z)'1' =  g r and L  is defined in (2.3). To compute the derivative of vec 3  with
respect to /F, vec S  can be expressed as
vec S  =  (FGA ® F) vec G or
vec S  =  (F 0  FGA) vec G ' =  /(p)P) (FGA ® F) vec G
75
Therefore,
a v ec3  ^  ^  + J ^ ) ( rG A < 8 r ) ^ g ^  ZUid (3.4)
djjb'
Svec E
dfj,'
dfj,' Ii=O
2NP(YA® Y )  £>g\
where -ZVp — (-Tp2 - (^p,p))/2 .^rid -D^ ? — (-Zp2 (^Ptp)) •
Boik (1998) showed that the maximum likelihood estimate of S , E, can be ob­
tained by solving
s^(e)
80
=  0.
fi= 0
(3.5)
There is no simple, closed form solution for E. Therefore, the solution, E, to this
equation has to be obtained numerically. Details are explained in the next sections.
Fisher Scoring Algorithm
The Fisher Scoring algorithm is an iterative procedure that can solve equation
(3.5). The method is based on the derivative of the loglikelihood function and the 
expected value of the Hessian matrix. Denote by I(O) the loglikelihood function of O 
given an observed Y .  The MLE of 0, 0, is the vector that maximizes 1(0). Let Oq 
be an initial guess and Oi be the estimate of O after the ith iteration and let Igi be 
Fisher’s information at the value O = Oi. That is,
Ia = - E
" a ^ (e )
O=Oi-
Expanding £(0) in a Taylor series around O = Oi yields
(3.6)
£(0) sd £(0i) + (0 — Oi)1 gg. — ~(0 — 0i)'Ig.(0 — 0i). (3 .7)
76
Set the derivative of t{d), expressed as (3.7), to zero and solve for 0. The result is
9 =  & +  (3.8)
where g§. is defined in (2.63). The right-hand-side in (3.8) becomes the new guess 
and the procedure is repeated until convergence.
The above description is a conventional application of the Fisher scoring algo­
rithm. The present application is not conventional. Details are given below.
Solving the Likelihood Equation
Because there is no closed form solution for the MLE, the solution must be ob­
tained numerically. To compute the maximum likelihood estimates numerically under 
parameterization of A in (2.4), Boik (2002a)’s algorithm is adopted as follows:
Step I Compute initial guesses for f  and A (discussed in the next section). Denote 
these guesses by F0 and A0, respectively.
S tep 2 Denote the maximum likelihood estimate of S  after the ith iteration by
S i =  FiA if;
Step 3 Set /Lti = 0 then Si =  (O' <£>')'. Use one iteration of the Fisher scoring 
algorithm to update Si, that is,
Si+i — Si +  Igi1Qgi,
77
where g§. is obtained from (3.2) evaluated at 0 =  Oi and I§. is defined in (3.6). 
This is not the conventional Fisher scoring algorithm because fa  is not updated 
from iteration to iteration. Instead, fa  is set to zero and updated from zero in 
each iteration.
Step 4 By using the updated 0i+i from step 3, compute G( fa+1) from model (2.31) 
or (2.33), and Aw  =  A((pi+1).
Step 5 Update f i+1 =  t iG ( fa +1).
Step 6 Iterate steps 2-5 until convergence.
To numerically compute the MLE under parameterization of A in (2.9), step 4 of the 
above algorithm is replaced by
Step 4 By using the updated 0i+i from step 3, solve for =  V i f  i+i +  i+1,
where Vi and V2 are given in Theorem 2. For given <£j+1, f i+i is obtained from 
(2.65). Compute Am  =  A(^m ) and G(fa+1) from model (2.31).
Finding the Initial Guesses
In using an iterative procedure, the original initial guess is very important. The 
algorithm may not converge to the MLE if the original initial guess is far from the 
MLE. The following manner is used for making initial estimates of f  and A.
78
To find an initial guess for f  under the hypothesis (1.21), H0: A  G TZ(TM), the 
singular value decomposition of the sample covariance matrix is employed as follows:
S r*A*f*/.
f ;  f ; 1
Assume that the diagonal entries of A are sorted from small to large and f  is a 
matrix of the corresponding eigenvectors. The quantity f  can be partitioned as 
, where and fg  have dimensions p x m  and p x (p -  m), respectively. 
Denote the original initial guesses for f  and A by F0 and A0, respectively. Let A* be 
a p x m semi-orthogonal matrix, where A* — (A Ac) and A c is orthogonal to A. 
Let Q be a block diagonalp x p  orthogonal matrix which, when applied to [A* A*c] , 
results in a matrix which is as similar as possible to f  . To find F0, choose Q  such
that
F0 =  [A* A*c] Q t t f * , (3.9)
where A*c is an orthogonal complement to A*. Writing Q as a block diagonal matrix
yields
[A * A*1 (  E?1 Q2 )  “  (f i f O )
where Qi G 0(m ) and Q2 G 0(p  — m). Accordingly,
A*Qi % F1 and (3.10)
A ^Q 2 % fg. (3.11)
To solve for the orthogonal matrices Qi and Q 2 , the orthogonal rotation to congru­
ence by Cliff (1966) is repeatedly applied. This method minimizes \\A*Qi - F 1H2
79
with respect to Qi for (3.10) and \\A*CQ 2 — I^H2 with respect to Q 2 for (3.11) and 
maximizes tr( T e1Q 1A*1) and t i (T*2Q'2A*C), respectively. The results are shown in 
the algorithm below.
It follows from (3.10) that
[A A c] = T lQ 1 = IB1 B 2], (3.12)
where B i  and B 2 are the first q and the last m — q columns of T e1Q1, respectively. 
The quantity A c must be orthogonal to A. Accordingly, let A c = (Ip - A A 1)F  for 
some F , where F  is orthogonal by columns. Thus,
(Ip - A A 1)F  = B 2. (3.13)
The following steps give an algorithm for computing T0.
Step I  Begin with any orthogonal matrix Q1.
Step 2 By using Q 1 from step I, compute B 2 from (3.12). Solve for F  (Cliff, 1966) 
in (3.13) as follows: write the singular value decomposition of B'2 (IP -AA ') as
B 12 (Ip - A A t) = U iD iJ[
=  J iU 1.
Step 3 Compute Ac and A* by using F  from step 2. Solve for Q 1 (Cliff, 1966) in
(3.10) as follows: write the singular value decomposition of f^A* as
T 1 A* = U2D 2J t2 
= >  Qi =  JiU2.
80
Step 4 Repeat steps 2-3 until Qi converges.
Step 5 Solve for Q2 (Cliff, 1966) in (3.11) as follows: write the singular value de­
composition of f  2 A*c as
t*A*° =  U3D 3J^
= >  Q2 =  JzU3.
Step 6 Obtain f 0 from (3.9) by using Qi and Q2 from steps 4-5, respectively.
To find A0 or <£0 that satisfies model (2.4), let A =  diag(A ). Differentiating ||A - T i 
exp{©T2y 0}||2 or minimizing ^A* -  Ti exp{©T2v?0} j  ^A* -  Ti exp{©T2y>0} j  with 
respect to exp{©T2<^ 0} yields
9 ^A — Ti exp{©T2y?0}^ ^A — T l exp{©T2y?0}^
9exp{©T2y 0}
Equating the derivative to zero yields
T1rTiexplQT2^  =  T1rA
exp{QT2 =  (T1rT1)-1 T1rA*.
If (T1rTi)-1 T1rA* contains positive numbers only, then taking the log of both sides 
yields
<?„ =  (T2T2) - 1T^ In ((T 1X ) - 1TiA- I . (3.14)
If (T1rT1)-1 T1rA* contains non-positive numbers, a way to find <£0 is to minimize 
I) A* — T1 exp{©T2<£0}||2 with respect to which can be done numerically by using
81
the Newton-Raphson algorithm. To compute A0, in (2.4) is replaced by <£0.
To find an initial guess for £, say £0r that satisfies the constraint (2.10), the 
following algorithm is used.
S tep I Set cpo =  tr(5) and A* =  diag(A*). Let X  =  [I93 T4] and if (T^T3) 1 T3zA*
contains positive numbers only, define
£o
d j  =  (X 1X y 1 X 1In [(TgT3)-1T^A*
where d is a scalar. If (T3zT3)-1 T3zA contains non-positive numbers, can be 
obtained numerically by minimizing ||A — A||2 with respect to ^0, where A is 
defined in (2.9).
Step 2 Compute Vi and Vr2 from Theorem 2 and obtain (p0 = (Vr2zV2) 1V2zC0-
Step 3 Set Co =  V4T0 +  V2^ 0. For fixed <p0, solve for T0 to satisfy the con­
straint (2.10) by using the Newton-Raphson algorithm in (2.65). Use f  =  
(F1zF i)-1F1zC0 to be an initial guess.
Step 4 Obtain new C0 from
Co =  V if0 +  V2Cp0.
Step 5 Repeat steps 2-4 to obtain Co and cpQ that satisfy the constraint (2.10).
Asymptotic Distributions of Estimators
82
Normal Population
Assume that n S  ~  Wp (n, S). Let 6  — {p! $ ) '  be the maximum likelihood 
estimator of 0 — (/i' ip')'. A useful property of the MLE is that as n —*• oo,
V^(0 - 0 )  A  AT (o, J9"1) , (3.15)
where J9 is the average Fisher’s information and is defined as
I9 =  i£ [S (0 )£ (0 )'], (3.16)
where S(O) is a score function, given in (3.2). It follows from (3.16) that.
I9 =  ( S - 1O S"1) E [v e c (5 -S ){ v e c (5 -S )} /] ( S ' 1® E -1Jf W
=  ^FW ' (s - 1 ® S - 1) var(vecS')(S-1 ® S -1Jf W,
where F^1) is defined in (3.3). Magnus and Neudecker (1979) showed that, under 
normality, var(vecS) — ^2 AZp(S  ® S), where N p is defined in (2.47). Therefore, the 
average Fisher’s information can be expressed as
I9 = ^f W' (S -1O S -1) f W.
Partition I9 conformably to 0 =  (/I ip')' to obtain
Tn =  (  I
Theorem 9. Let I9 be the average Fisher’s information defined in (3.17). Partition  
I q conformably to 0 as (3.18). The parameters fi and cp are orthogonal (Severing 
2000). That is, Iq is a block diagonal matrix and can be written as
(3.17)
(3.18)
83
Proof. To show that Ig is a block diagonal matrix, Igtfiip and Igjipfl must be zeros. It 
follows from (3.18) that
dH(0) '
dp  O dtp' _
Because
it follows that
d£(0)
dtp' ^{vec(3 -  Z)}' (Z -I ® Z-^)
dvec Z 
dp'
I f d2i{Q) 
n dp  <g> dp ' /J,=o
i  (.Dg1' (AT ' 0  T l)  (S "1 0  S 1) (T 0  T )LD '1! 
\  D f  (Ip m - 1) L  D ^ .
Expressing L  as in (2.3) gives
h m  =  I  B g 1X  (e? 0  ej) eJ'A-1 D f
J=I
=  j T ( I ?  -  (e? 0  e JX A r1 D f
J=I
=  0,
because i(p,p) (e^ ® e^) =  (e^ ® e^). 
Similarly, It can be shown that
- 0,tpii —
' d2£(0) '
dtp <g> dp'
± D P 'L ' ( I r ® A -l)D ,
2
=  0.
( i )
G
Hence, Ig is a block diagonal matrix. □
84
Due to the block diagonal structure of Ig, p, is asymptotically independent of cp 
which implies that f  also is asymptotically independent of A. Accordingly, it follows 
from (3.15) that the joint asymptotic distribution of p, and cp is as follows:
VS(A-M ) A  w ( o j ^ ) ,  (3.19)
VS(v -  V) A  JV (o, ^ ) ,  (3.20)
and p, R ip.
Asymptotic distributions of functions of 0 can be obtained by the delta method. 
Let 0 be a random p-vector with asymptotic distribution y/n(0 — 0) —^4 N (0 , Ig 1). 
Suppose that g(0) is a differentiable function. It follows that the asymptotic distri­
bution of g(0 ) is
\/n[g{0 ) -  g{0 )] N O,D(0)Ig 1D(O)' ?
where
D ( 0 )  =
dg(d)
96' 0=0
The proof can be found in Sen and Singer (1993).
To find the asymptotic distributions of eigenvector estimators, consider that
t  = TG
vec f  =  (Ip <g>r) vec G.
85
The Taylor expansion of vec f  around /1 =  0 can be expressed as
vec f  =  vec F +  (lp ®F) (/I — /z) +  Op (n-1) and
Vn vec ( f  -  F) =  y/ri(Ip ®F) (/I -  //) +  Op (n-1/2) .
Using the delta method and (3.19) yields
V5vec(f - r )  A j v f o ,  ( ir o r )  o f  r : ‘ D ^ ' (ir o r ') (3.21)
If Vi is identifiable, it follows from (3.21) that
V5(7i - l i )  N  0, (Ir o e jr )  D ^ '( I r OVei) (3.22)
Similarly, the relationship between A and A is
X = L' vec A.
The Taylor expansion of A can be written as
A — A T  ^(y  — ( p )  +  Op(n )^ and 
Vn(A -A )  =  Vn (<p -  <p) + Op (n_1/2).
Using the delta method and (3.20) yields
V ^ ( X - A ) A l V  0 , (3.23)
Non-normal Population
Assume that S  no longer follows a Wishart distribution. Let ^ f o r i  =  1, 2 , . . . ,  N, N  
n + 1 be i.i.d. random p-vectors with covariance matrix of S. Assume that the fourth
86
moments of y  are finite. Then, as n —> oo,
- 0 ) N  (o, I 0 1M 0 ,
where f l  =  Iimn^ 00 (S -1 ® S -1) ^fn (S _1 ® S -1) and ^ rn =  nvar(vec S). 
Boik (1998) provided an unbiased estimator of Asymptotic distributions o f f ,  J i, 
and A can be obtained by substituting I 0 1H I0 1 for I 0 1 in equation (3.21)-(3.23).
87
CHAPTER 4
TESTING HYPOTHESIS AND BARTLETT CORRECTION
In principal component analysis, it is often of interest to test hypotheses concern­
ing eigenvalues and eigenvectors of the covariance matrix. The matrix S  is rarely 
known and tests of hypotheses must be based on an appropriate estimate of S  (dis­
cussed in the previous chapter). This chapter deals with procedures for testing di­
mension reduction. The procedures are based on the standard likelihood ratio test. 
The hypotheses of interest concern the dimension reduction in the number of com­
ponents and in the number of original variables. For each hypothesis, the likelihood 
ratio test statistic is derived and improved by multiplication by a Bartlett correction 
factor which provides a better approximation to a X 2 distribution.
Hypotheses
Let Ai < A2 < • • • < Ap be the eigenvalues of S  and T  = ( J 1 J 2 - ■■ 7P) be a 
matrix of the corresponding eigenvectors.
In testing dimension reduction in the number of components, the eigenvalues 
give information about the dimension of the data. If the cumulative proportion of 
total variance accounted for by the p — k smallest eigenvalues is small, then little 
information is lost by discarding the corresponding components. The general null
88
hypothesis of interest is defined in (1.8) as
where Ci is a matrix of known constants and C0 is a vector of known constants. 
This hypothesis is tested against the alternative Ha which says that Ho is not true. 
Specifically, to focus on dimension reduction in the number of components, Ci is 
defined as (l'v_k O7fe)' and C0 is a small number recommended by an expert in the 
area of study. The test of this hypothesis is useful for constructing a confidence 
interval for proportion of variance.
In testing dimension reduction in the number of original of variables, the general 
null hypothesis of interest is defined in (1.21) as
where A is a fixed p x g semi-orthogonal matrix with rank q and F is a matrix of 
eigenvectors. The alternative hypothesis is the general Ha which says Ho is not true. 
Suppose A  is equated to A =  {Iq O')'. Then, the hypothesis to be tested is that 
the first k components have zero loadings in the last q positions. This test is the 
redundancy test. If the hypothesis is not rejected, then the last q original variables 
have zero coefficients in each of the first k components. Hence, these variables are 
redundant and can be discarded from the study.
In deriving suitable tests for testing these hypotheses, the well-known likelihood
ratio tests are used.
89
Likelihood Ratio Test
This section introduces the likelihood ratio test (LRT) related to the maximum 
likelihood estimates discussed in Chapter 3. The general strategy of the LRT is to 
maximize the likelihood function under the null hypothesis, and also to maximize the 
likelihood function under the alternative hypothesis.
For the loglikelihood function in (3.1) and the maximum likelihood estimates from 
Chapter 3, the likelihood ratio test statistic is
L 0 ( 0 O)
Zfa(Oa)'
(4.1)
where L0(Oo) and La(Oa) are the likelihood functions maximized under H0 and Htt, 
and 0O and Oa are the maximum likelihood estimators under H0 and H0 respectively. 
Equivalently, it follows from (4.1) that H0 is rejected for large values of
Q =  - 2  In A* =  2 { 4 (0 ,)  -  Z0(O0) )  =  D(0,; 0) -  D (0O; 0), (4.2)
where D(O) 0) is defined as 2 |l (0 )  -  1(0) j  and i(-) =  In L(-). Denote the dimensions 
of 0O and Oa by z/0 and z/0, respectively. The test statistic Q asymptotically follows
Q (4.3)
where The above general result may be used to obtained a test statistic
in variety of situations including hypotheses of interest in (1.8) and (1.21). Boik 
(2002a) discussed a way to compute D(O) 0) in the following manner.
90
Denote the ith derivative of the loglikelihood function, evaluated at // =  0, by
£i(9). That is,
4 0  =  T -
4 0  =  90 '@ 90 '
, (0] 8V(6)
^   ^ 0 0 '(E) 00'(3,00'
and so forth. Let
E -J =  n-1 E(li(0)) and
^  ,
where E(Zi) =  0. Therefore,
£i(0) =  ■\fnZ{ +  nKi.  (4.4)
Let 0 be a consistent root of the likelihood equation such that l\(0) =  0. Expanding 
l\(Q) in a Taylor series around 6 = 8, substituting Ii(O) from (4.4), and dividing 
both sides by ^/n yields
0 =  Zi +  (ji 2 9 +  (j1 +  n  ^ (g (E) g)
(n - l z 4 + U-1ET4) ( g® g® g )  +  Op , (4.5)
where g =  ^ ( 6  — 6) and K i = 0. Let
g — -\/u(0 — 6) — Sq n 2Si n  1^ 2 +  Op ( n^ 2) . (4.6)
91
To solve for Si in terms of Z i, replace g in (4.5) by (4.6) to obtain polynomial functions 
of n  which have zero coefficients. Collecting each of coefficients and solving for d, 
yields the following results:
d0 =  I g 1Z 1,
<5l = Ig 1 I  ^ 2 0^ + 2 (^0 ® 0^) J- ’
^2 =  Ig 1 "I ^ 2^1 +  2^3 (^o ® do) +  Ks  ^do <8> di) +  — !^4 ^do ® do <S> do) ^ ,
where K 2 =  - Ig .
To express D(6-, 0) in terms of Z i, i = l , 2 ,3, expand D(0] 0) in (4.2) in a Taylor 
series around 0 = 0 and substitute £i(0) as (4.4) to obtain
D(0] 0) =  2g'Z 1 +  g' ^Z 2-\- K 2J g +  -g ' 1Z^ +  n  ^.Ks) (gf ® g)
+ - g '  ^n- I z 4 T n -1K 4) (g ® g ® g) + Op (n - l )  .
Substituting g from (4.6) and collecting terms of Op(I), Op(n- 1) and Op(n-1) yields
D(6; $) = Z ? I eZ l  + n~i { Z ?K 3 ( Z l® Z { ) / 3 + Z t1Z 2ZD (4.7)
+n - i {ZsZ; +  Ka(Z* ® Z*)/2}'T^  {Z^Z* +  Ks(Z* ® Z*)/2}
+Z*'K4 (Z* ®Z*®  ZH /12 +  Z^Zs (Z  ^® Z*) /3 +  Op (n -# ) ,
where Z^ = I 0 1Z 1. Because it can be shown that
Z 1A iV (O 5J e) ,
92
the Op(I) term in (4.7), Z^'IgZ^,  is asymptotically distributed as X l  and E (Z^1IgZ f )  
v. The Op and Ov (n-1) are useful for constructing Bartlett corrections. Ma­
trix expressions for Z i and K i for 1(0) in (3.1) are shown in the last section in this 
chapter.
Bartlett Correction
It follows from (4.3) that
<3 ~  X L  +  °p ( " _ l ) •
The Bartlett corrected test statistic (Boik, 2002a) is
Qc = Q %/AE(QIH0)
which has null distribution Qc ~  +  Ov j , where
E(QlHo) =  E {Dfe,,; e,,)|H0} -  E {d (»o; #o)|H0}
(4.8)
(4.9)
The approximate expectations can be obtained by evaluation of the expectation of 
equation (4.7). To compute the expectation of D{6\ 0) in (4.7), the third and fourth 
order moments of vec(S — S) are needed.
Third and Fourth Order Moments
Boik (2002a) discussed an easy way to get these moments directly from the 
cumulant generating function in the following manner. Let S  =  ; where
Ui iV(0,E), then S  ~  Wv(n, S /n ). Note that Dp and H p satisfy Dp vech X  =
93
vec-X", H p vec X  =  vech X , and DpH p = N p, where X  is any p x p symmetric 
matrix and N p is defined in (2.47). The moment generating function of Vech(Sr) can 
be written as
M vech(S) (t) = E [exp (t' vech (S))]
=  E [exp ( f  ITp vec (S))]
|E*|n/2
™ • |S |n/2 ’
where vech S* 1 =  vech S -1 — |  [D1pDp) 11. Then, the cumulant generating function 
of vech (S) is
=  | l n | S * | - | l n | S | .
To find the first four derivatives of the cumulant generating function, the following 
results are repeatedly used:
9 vec 2 * 
a t'
a vec E*
: — HL and
-  (£* O E*) H 'n F
(4.10)
(4.11)
The first derivative of the cumulant generating function can be written as
( t )
ac(°)(t)
aC(°)(t) nd ln |E * |
2 at
as*
ac(o)(t) =  (vec H*)'H'
94
Therefore,
C w (t) =  HpVecE* or vechS*. (4.12)
The second derivative of the cumulant generating function can be written as
C {2\ t )  = __ avecSdt' ® dt = H n
= - H Pcs* o (4.13)
The third derivative of the cumulant generating function can be written as
c (% )
.d3C ^ { t )  _  2h  d{S* ®
dt' 0  dt' ® dt n p dt'
L  (I , ®(vec Jp)' ® Z*) <8 J f ;
-^Jfp  (S* 8  (vec 2*)' 8  E*) %  8  JTp)', (4.14)
by (vec Jp)' (E* 8  Jp) =  (vec E*)'.
To compute the fourth derivative of the cumulant generating function, the following 
results are useful:
E (A B C  <g> D) = {(vec B) '  8  E }  [(Ir 8  vec A )C  8  D ] ,
where B  is q x r, A , C, D,  and E  are any matrices that are conformable for multi­
plication, and
E (A  8  BCD )  =  {(vec C)' 8  E }  I(tP,qr) [A 8  (vec B ) 1 8  D ] ,
where A is t x s,. B  is p x g, C  is g x r, and D  and E  are any matrices that are 
conformable for multiplication. The fourth derivative of the cumulant generating
95
function can be written as
( t )
^C(Q) (t)
d t  <8> d t '  ®  d t '  ® d t
- <g> H p^  (ip ® vec Ip ®(vec S*)' ® S*)
+  ® (2* #  (vec S*)' ® vec f ,  ® Ip)
+  (ipfp+D O Hp)  ^ ® S* 0  (I(p,p2) ® Ip) I  (Up ® H p )'.
8 f /a(vec2*)V
Substituting (4.11) and simplifying yields
OO
c ^ ( t )  =  — (Ifp®Ifp)iVp,{2*®(vec2*y®vec2*®E*}(IZp®apy (4.15)
+%#„ ® Hp) (^ ® I(p,p) ® ip) (E* ® S* ® S* ® S*) (ilp ® Hp)'.
The matrix S*, evaluated at * =  0, simplifies to S.
Suppose to is a random vector with E(tu) =  cr. It can be shown that
C^(O ) =  E(tu) and
(O) =  E {(iy — a)(w  — cr)'} .
Equivalently,
C^(O ) =  HpE[vec(S -  E){vec(5 -  S)}'] H 1p, (4.16)
where w  =  vech S  =  H p vec S . Pre-multiplying (4.13) by Dp and post-multiplying 
by .Dp and using (4.16) yield
E {vec(S -  S){vec(5 -  E)}'} — N p^E ® E) =  var(vec S), (4.17)
because N pvec(S — S) =  vec(S — S) and -ZVp(£  ® S) =  (S ig)S ) N p. The third 
cumulant of w  can be written as
9 6 .
(0) =  E {(to — cr)' <3 (w — cr)(w — cr)1} .
Equivalently,
C (3)(0) - H pE [{vec(S -  S)} ' 0  vec(5 -  S){vec(5 -  S)}'] (Hp 0  Hp)', (4.18)
where w  =  vech S  = H pYecS. Pre-multiplying (4.14) by Dp and post-multiplying 
by (Dp 0  Dp)1 and using (4.18) yield
E [{vec(S -  S)} ' 0  vec(5 -  S){vec(5 -  S)}'] =  ^ JV P{E 0  (vec S )' 0  S}
x (Np 3  N p). (4.19)
Similarly, the fourth cumulant of vech S  can be written as
C^(O) =  ( g p O # ,) E { vec(3  -  S){vec(S -  S)}' 0  vec(S -  S){vec(S -  S)}'}
—2Np2 [E{vec(5 -  S){vec(5 -  S)}'} 0  Ejvec(S -  S){vec(5 -  S)}'}]
— E{vec(S — S) 0  vec(S -  S)} E{{vec(5 -  S)} ' 0  {vec(5  -  S)}'}
x (H p 3  H p)1. (4.20)
Pre-multiplying (4.15) by (Dp 3  Dp) and post-multiplying by (Dp 3  Dp)' and using 
(4.20) yields
97
E [vec(S -  S){vec(5 -  £ )} ' ® vec(5 -  S){vec(S -  S)}'] =
32
—N p2(Np ® jVp){5] ® (vec E )' <g> vec E  ® E }(-ZVp ® JVp) ■
+^(AZp ® AZpXJp ® -T(Pf) ® 4 ) ( S  ® E  ® E  ® E)(AZp ® AZp) 
+2AZp2[E{vec(5 -  E){vec(5 -  E)}'} 0  E{vec(5 -  E){yec(5 -  E)}'}] ■
+  E{vec(S -  E) ® vec(5 -  E)} E{{vec(5 -  E)}' ® {vec(S' -  E)}'}. (4.21)
To compute the last part of (4.21), the second derivative of (t) with respect to t  
evaluated at * =  0 can be obtained as
D2C ^  (t) _  2 
d t® d t  n (Hp ® Hp) vec (E ® E ) . (4.22)
Algebraically,
d2C ^ ( t )  
dt  ® dt E{vec(S' — E) ® vec(5  -  E)}. (4.23)
Pre-multiplying (4.22) by (Dp ® Dp) and using (4.23) yields
E{vec(S — E) ® vec(5  — S)}
2
(AZp ® AZp) vec (E ® E ) . (4.24)
n
Similarly,
E [{vec(S -  E)}7 0  {vec(5  -  E)}'] =  ^ {vec (E 0  E)},(AZP 0  AZp). (4.25)
Substituting (4.24) and (4.25) in (4.21) yields 
E [ vec(£ -  E){vec(S -  E)}' 0  vec(S — E){vec(S -  E)}'] —
32AZp2(Np ® AZp){E 0  (vec E )' 0  vec E 0  E}(AZP ® AZp)rv
98
+-g (A?, ® -Np) (ip # 8 ip) (S 8 Z 8 S 8 Z) %  8 TVp)
g
+ — iVp2 (Np 8  JVp) (Z 8  Z 8  Z 8  Z) (Np 8  N p)
+ - ^ (N p 8  N p) vec(Z 8  Z){vec(Z 8  Z)}'(iVp 8  N p). (4.26)
The results of the third and fourth order moments of vec(S — Z) are used for com­
puting the expectation of D(0-, 0) in (4.7).
Expectation of LRT Statistic
Partition 0 as 0 =  (fi' tp')' with dimension (vi U2)', where u = ui + u2. In 
particular, k = 2. Denote the derivatives of vec Z  with respect to 0 as follows:
!Pd) -  dvecZ  _  A dvecZ
S= I 5
f2s d2 vec Z  A A  52 vec Z  . .
^  -  0 0 '8  0 0 '"  ^  5Z  3% e  0 0 ; ®
S= I t = i  5 z
= ^ YeC^ l- — dvec (F(2\ p 2u, u ) ;
00 8  00 
0^  vec Z k  k  k 03 vec Z
S=I t — 1 U = I^  -  00' 8  0(9'8 6)0' "  6)0: e  0 0 :8  00 :/^ '"  ® ®
jp(ni) _  dvec (F^3\ p 2u2, u )  .
The matrix E iyl, with dimension z/% x z/ for I < % < & is defined as
Fiyl, — [Oyjxaj Jyj 0UiX{v—ai—Vi) ] •
where a, =  Y^j-i vj ~  vi- For convenience, define J 1O), j p (2) ,  f X^\  and F'  ^ as
F {1) =  (Z "1 8 Z ”1) F (1); F (2) =  (Z -1 8  Z " 1) F (2);
l ( 1 ) (Z -1 8  vec Z - 1 8  Z "1) F ^ ; and F  2 =  (Z-1 8  vec Z "1 8  Z ' 1) F®
99
Boik (2002a, in a supplement) derived the expectation of the right-hand-side of 
the equation (4.7) as follows:
I
E{D(9; 0)} =  dim(O) 4- & +  O M ,  (4.27)
where
Cl
u {1)
Cs
Cs
C4
L 1
Cs
Ce
Cr
Cs
U(%
tr { (lg-1 p w  0  JVp) (Ip O v e cS o  Ip) Iq 1Uw  } ;
(I^ 0  S -^  0  S-^) -  2 ( f  0 lp ,) ;
\  tr {  ( F w  I ' 1 O  F w  I q 1)'  F (1)/ I q 1K ; )  ;
2 F {1)' (Fw  O F w ) -  dvec F (2), z/2, z/)' -  ^ F w ' F (2);
^ tr  [{iy" 0  JVp(S  0  S)} [ / ( ^ " [ 7 ^ ]  ;
L 1Iq 1L 1 +  L 12 ( IM  O I0-1) X2;
if /W  vec (F^) Iq 1) ; X2 =  ^ vec { t / ^  (l„ O F «  I 0" 1) } ;
^ t r  {JV„ ( I ^ 1 0 1^1) ^ ( v e c ^ 1) ' ^ ' ^ 1^  vecj^-1;
tr {JV„ (Iq1 O I g 1 F w ') U wiI g 1K ; ]  ;
^  (vecI ^ 1)' vec ( F ^  I ^ 1) ;
i  tr  { JV, ( I g 1 F w ' O I0- 1) U (2)} ;
—6 dvec ^F*' ^ , p2zv, p2) F ^
+  dvec dvec { (S  1O S ' -1) F ^ ) ,p V , '} '  ,P2v, P2.
+2 (IP2 O F (1))' (Ip OvecS 1O Ip) (2 JVp +  I(P)P))X3;
100
(s =  l { v e c ( ^ ^ ) y u ( " ) ( v e c ^ ) ;
Cio =  ^ t r { N X ^ ® ^ ) ^ }  +  l ( v e c ^ ) ' ^ v e c i y \  and 
j q  =  12 - 9 ^
- 2  dvec (f (1)/ F (3\  u2, z/2j  -  ^ F®' F {2).
Under a Restricted Ha
To compute the Bartlett-corrected test statistic in (4.8), the expectation in (4.27) 
is used twice to obtain the right-hand-side of (4.9).
Under an Unrestricted Hffl
To compute the Bartlett-corrected test statistic in (4.8), the expectation in (4.27) 
is used once to obtain E {-D(0O; 0o)|Hq j  in (4.9). To obtain E ^D(0a] 0a)|Ho| ,  Boik 
(2002a) showed that if S  is unrestricted under H0, then the asymptotic distribution 
of Q in (4.2) is X lAi where . Furthermore,
b {5(»„;6„)|Ho} =  +  P (2P2 + to  -  1J +  o  (n~2) . (4.28)
To verify equation (4.28), it follows from (4.2) that
B {D (0o;0o)|Ho} =  2 E { f ( 0 , ) - W }
=  2E — In |5 | + - I n  |S | |
=  2E { - | I n  |nS| +  y  ln(n) +  |  In |S |}  .
101
Seber (1984) showed that InSfI ~  |S | HLi X n - i + i -  Accordingly,
p
E 0a) |H0} =  n p ln (n ) -n J ^ E ( ln X ^ _ i+1) . (4.29)
i = l
The expectation of ln(y), E{ln(y)}, assuming y ~  can be obtained by differenti­
ating both sides of
r (D = I
iexp{-% }j 
2# %
(4.30)
with respect to n, where T(-) is the gamma function. The useful result concerning 
the digamma (-0) function is as follows:
ainr(ra) _  r(w )
dn T(n) "
Then, the result of differentiating both sides of (4.30) yields
^ G D = I
V2 iexp{-^}
2 ir(% )
2J dy — ln(2).
Therefore,
E{ln(y)} =  -0 ( I )  +  ln(2).
Expand the digamma function to obtain
( I )  +  ln(2) -  ln(n) “  ^  “  3 ^  +  ' * ‘ (4-31)
Applying result (4.31) to (4.29), simplification yields the result (4.28).
102
Matrix Expressions for Z i and K i
To obtain matrix expressions for Z i and K i, the second and third derivatives of 
vec S  with respect to O1 evaluated at jit =  0 are needed. The following properties are 
repeatedly used:
{A BC  ® D )E  =  [A ® {vec(C")}/ <g> D]{vec(JB/) ® E}  and (4.32) 
{ A ® B C D )E  =  {A (g) vec(D') 0  B } (B  0  vec C), (4.33)
where A, B ,  C, D,  and E  are any matrices that are conformable for multiplication. 
It follows from (3.4) that
d2 vec S  
d^i' ® dy' 2N„
(TCA  ® r ) d2 vec G  dn'  ® dii'
I ®
d vec G  X
~ w ~ ) :
Applying (4.33) yields
d2 vec E 
dfi' ® dn'
2 Np (TC?A ® r) a2 vec G dii' ® dfi'
+{T ® (vec A)' ® r} T 9 vec G 1M  -Q ^ r -
d vec G
dfi'
(4.34)
Therefore, evaluated at /4 =  0,
92 vec E
dfi' ® dfi'
=  2 AL
£1=0
(TA ® r )
+ {r O (vec A)' ® r} (l(p,p) ®
103
Also, it follows from (3.4) that
- “ 4 {,r»“ r|2 s ^
-  > » , # '^  j(r(g ,(v ec jpy (g ,r) by (4.32).
Therefore, evaluated at ^  =  0,
d2 vec S
d(p' <S> dpi' At=O
= 2 N v (r ® (vec I v)' ® r) ® B g }} . (4.35)
The second derivative of vec S  with respect to ip' can be written as
d2 vec S
dip' ® dip'
( r ® r ) X B f .
At=O
Similarly, the third derivative of vec S  with respect to y>' can be written as
d3 vec S
dip' <g> dy?' ® dyi'
=  ( T O T )T B f .
At=O
Next, it follows from (4.35) that
d3 vec S
dy ' O dy?' ® dp,' 
It follows from (4.34) that
At=O
d3 vec S
dip' O dpi' O d/T =  2 AL
2 TVp {T ® (vec Jp)' O T} B f  O B f  I .
d<p (4.36)
dip;{r®  (vec A)' dpi' ) \
For convenience, the two parts on the right-hand-side of (4.36) are labeled as E a and 
E b. Using (4.33) and evaluating at /Li =  0, E a can be simplified to
Ba =  (T ® (vec Jp)' ® r )  { z  B f  8  B f  }  . (4.37)
104
The quantity E b, evaluated at /Li =  0, can be written as
E h =
d<p
d
, | ( r  ® (vec A) ® r) (-f(p2)p)-T(p!p2) ® lp) (^I(p,p) D^q  ® |
p) \j{p,v) ) }
I v 2 ®  { I ( p ,p 2) ®  I p )  (^I(p ,p ) ®  H q '1')
vecd(p'
^ (vec A)'
dcp1 (S irg ir
{vec {a  D i^  j ' o r  o r  J Iv2 ®(Ip O J (P 1P) O Ip) ( r> y  0> ;4.38)
Combining (4.37) and (4.38) yields 
93 vec S
<9y/ O  9/Li' O  9/Li'
2 AL
H=O
{r O (vec Jp)' 8  r} { & 8  }
{(•+ <( (vec L  ) o  r  o  r
X 8 ( jp 8  j(p,p) 8  jp) (D y  ®
Next, it follows from (4.34) that 
93 vec S______________= 2 N
9/Li' O  9/Li' O  9/Li' p
9
9 - ( ( r G A ® r ) | l vecG
9 /i' O dfi'
Zt 9 vecG _ 9 vecG \
(4.39)
9fi' [
~ { r  ® (vec A)- ® r }  ™ 7 -  ® - ^ r - J
For convenience, the two parts on the right-hand-side of (4.39) are labeled as E c and 
Ed- Using the product rule and (4.33) and evaluating at /i =  0 yield
#C = {r 8  (vec A)' 8  r} (j(p,p) J^ y 8  J^y) +  (FA 8  T) G g ). (4.40)
Using the Kronecker product rule, E a, evaluated at /Li =  0, simplifies to
% =  2 iVp {r 8 (vec A)'8 r} ( j^ rg^ ry )
+ I ( p 2,p2) O y  ® J(p,p) G y )  ( I v 1 O I(vi , i>i))
105
Using the following results repeatedly:
[ A  ® S ) — I(p,r )  { B  ®  A ) I ( Ctq),
where A is r  x c and B  is p x q, yields
E d = 2 N p I  (r <B) (vec A) 7 ® T) ( l M  D i^  ® ( l v3 + (IUl ® I ^ 1 }
=  (4.41)
where N vi = |  ( j v2 +  I(VltV1)^ j ■ Combining (4.40) and (4.41) yields
<93 vec S
d^ j,1 <8> dp,' ® dfi'
=  2 N n
H=O
(FA ® r) D i^  +{r ® (vec A)' ® T}
x <1 (Z(Pf).Bg) ® B g )) +  2 (/(pf) Z)g) ® B g )) (Z  ^®
The other order derivatives can be rearranged by post-multiplying by commutation 
matrices.
d2 vec S  _  d2 vec S  
dG't ® OO1s = dO's ® BG1t livt^  
d3 vec S  _  ^3 vec S  . .
dG't ® OG1s ® _  dG '^ dG '^ dG '} 1^ ’^ ®  Vu)
_  S3 vecS  (t ^  T \
The following properties are useful for simplifying expressions.
F (1) =  I M  .FM 
F<2> =  I m FW = F ^ I m  
f< n > =  ( / „ « / , , ,„ )  I ’M
106
fW  =  I m  .F<8> =  F®  =  jr<«)
=  I(W)) = ® I v)
F auj  =  V m  »/#>) ^ llul =  (J»- ® /(»,)) i ?(lll)
^ ^ ' ( E - 1 ® S " 1) =  2Ig
F (1),(S  ® S) F (1) =  2lg 
(S _1 ® S -1) vec S  =  vec S -1 
d v e ^  =  -  ( 2 - i  <8 D -i) F ^  =  F «
It follows from (4.4) that K i  =  0. Then
Zi =  ^  F ^ ( S - 1Ig) S -^ v e c (S f-S )
2
=  ^ F (1),v e c (S -S ) .
The second derivatives of £(8) can be written as
4 (9 ) =  ^ FM '{ f^ < S ,(S -^ S ^ )v e c (S f -S ) }
+ n F (1)/ ( Ip(S)(VecIp)' <g> S -1} ® vec(5 -  S) j^
" F {1)' (S -1 Ig)S-1) F (1).
Accordingly,
^ (0 )  =  ^ ( ^ ' { ! ^ ( S ^ e S - ^ v e c ^ - S ) }
-MF^)' ( s -1 ® (vec S -1)' ® S -1) (F ^  8  vec(S -  S))
" F {1)' ( S - 1Ig)S-1) F (1).
107
Taking the expectation of I2 (0) and dividing by n yields
K 2 =  (S _1 ® S - 1) F (1) =  ~  F (1)/ F (1) =  - I 0.
Therefore,
Z2 =  ^  F (n > {Jv ® (S "1 ® S - 1) vec(5 -  S ) } -  Vn F ^ '  (F^1) ® vec(S -  S ) } . 
Boik (2002a, in supplement) gave expressions for Z3, K 3, and K i  as follows:
Z3 =  v ^ ( v e c F « ® F « )  (Jp z, ® J(p)P) <g> /p) ( J iy ® F (1) ® vec(S -  E)}
+ 2V n F ^  [F ^  ® 2Vp(]p ®vec S ' 1 ® ],)' {jX1) ®vec(3 -  S )}]
F ^11)' {J#: ® ( S ' 1 ® S ' 1) vec(3 -  S )}
- 2 Vn F ^ 1)' (Jv ® S ' 1 ® vec S"1 ® S"1)' { (Jv ® F ^ )  N v ® vec(g -  S ) }
- V n F ^  ( F ^  ® vec(5 — S)} ;
K 3 -  2 F (1)' ( F ^  ® F ^ )  JVv -  F (n > ^Jv ® F (1)) N u F ^ '  F ^ ;
FT4 =  F (11),{ lv® (S -1^vec  S - 1 ® S - 1J7(F w ^ F w )J
x {4(NV ® Jv) +  2 J(V3,V)} +  4 F^^' (F w ® F^)) j  (Jv ® N v) +  ^ J(v,v,) j
-  F (1)/ (S "1 ® vec S - 1 ® vec S " 1 ® S ”1) ( F (1) ® F (1) ® F (1)) 
x {2 J(VjV2) + 3 1(v2,v) +4(Nv ® I v)}
-  F (111)' ( JV2 ® F w J I  (Jv ® N v) +  ^ J(V,V3) j
-  F (11)' ( Jv ® F w J j  (Nv ® Jv) +  ^ J(V,,V) I  -  ^ F ^ )' F w ;
where N v =  \ ( I vi +  J(v>v)).
108
CHAPTER 5 
SIMULATION
In this chapter, the performance of the likelihood ratio test and Bartlett correction 
(see Chapter 4) are evaluated and compared to competing tests. Simulation studies 
were conducted under 2 applications described in Chapter 2.
Simulation Study of Variable Reduction (Redundancy) Tests
The evaluation is based on test size and power of the test under different condi­
tions. The test statistics examined were 
o Likelihood ratio test (LRT) 
o Bartlett correction of LRT (BART)  
o Schott’s (1991) test (Tktq) in (1.16) 
o Bartlett correction of Tktq (T*) in (1.18)
o Bartlett correction of Tktq (T**), where T** = ^ . An expression for c can 
be found in Schott (1991).
The conditions under which the simulations were conducted are described in terms 
of the number of original variables (p), the number of retained components (k), the 
number of variables to be tested for redundancy (q), and the degrees of freedom of a 
Wishart distribution (n). The following conditions were considered:
o p =  10, k = 2 and g =  1,4
109
o p =  10, k = A and 5 =  2,5 
o p =  15, k = 2 and q = I 
0 p = 15, k = A and q = 2.
The degrees of freedom, n, varied from. 25 to 75 in the following manner,
n E {20,25,30,35,50,75}.
The nominal test size used was 0.05. For each simulation condition, Type I error 
probability was computed from 1000 data sets. The eigenvalues were ordered as 
follows: Ai < A2 < • • • < Ap. For simplicity, the covariance matrices examined were 
diagonal and have Ai =  • • • =  Ap_& =  I.
The Comparison Test 
Partition T as
r  =
/
X
Fn
q x m
T2I
( m—q )Xm
F3I
(p—m ) x m
Fi2
t?x(p-m)
F22
(m —q)X  (p—m)
F32
(p—m)x(p—m)
\
/
Consider the special case where F3I =  0. In this section a test of hypothesis (1.21), 
where A =  ^ ^ and M  is the first m  columns of I p, will be developed. Although
this test is not useful in practice (because F3i =  0 is not likely to be satisfied), it is 
useful in improving the efficiency of the simulations by constructing an estimator of 
test size whose variance is less than or equal the usual estimator.
Suppose that it is known that F3i =  0, A  =  f 9^ j , and M  — ( J . Then,
A e Tl(FM) ^  F 'A  G Tl(M)
HO
Let y  be an n x p matrix with distribution vec Y  ~  N(0, S  ® I n) . Then, S  =
TTp (n, E) and S  =  FAF' . Partition T  as ( T1 T  I. If it is known
\ n x q  n X ( m —q
Y3
- )  n X  (p—m)
that F31 =  0, then
A  G K (TM )  <=> S A; OO A2
F i1 Fg1 o'
o' T122 T132
Fn  0
F2I F22 
0 F32
F11A1F11 F11A1Fg1 0
F21A1Fz11 F21A1Fz21 + F22A2Fz22 F22A2Fz32 
0  F32A2F22 F32A2F32
T1 jl T3.
Accordingly, a test of independence between T1 and Y3 is a test of H0: A G K(TM)-
if F31 =  0, A , and M jlTH0 . Partition S  as
S  =
S12 S13
q x q gx(p-ro)
S21 S22 S23
(ro-s)x? ( m—q ) x ( m —q) (m—g)x(p—m)
S31 S32 S33
(p-m)Xg (p—m)x(m—g) (p—m)x(p—m)
The test statistic is
A1,, —
S 11 S i3 
S 3i S 33
ISfn||5 33| ’
which is distributed as Wilk’s A, where k = p — m: An approximate F-
statistic (Rencher, 1995) is given by
(I -  A 1J t) dhF = (5.1)
I l l
which has an approximate F-distribution with Clf1 and d/2 degrees of freedom, where
when qk = 2 then t  is set equal to I. If min(g, k) is equal to either I or 2, then 
the F  approximation in (5.1) has an exact F-distribution. Anderson (1984) gave the 
asymptotic distribution of M  =  - s  log(A9)fejn_fe), where s = (n — k) -  ~{q -  k + 1). 
The error is of the order O (IV~6). Box (1949) provided the coefficients to be used in 
term of O (N~6). The cdf of M  can be written as
dfi = qk,
P(M  < mo) — Pqk + ^2 (Pqk+4: ~  Pqk)
74 =  Y  +  19^0 {394 +  3A:4 +  IOg2A:2 -  50 (g2 +  k2) +  159} , and
=  f  +  ™ {  (g6 +  A;6) — 105 (g4 +  fc4) +  1, 113 («2 +  P )
+  (2Ig2 -  350 +  21&2) q2k2 -  2,995}.
Te
112
Variance Reduction Procedure
Consider two test statistics, namely the target test, L, and the comparison test, 
F. Suppose there are two outcomes of testing any hypothesis: rejection of the null 
hypothesis denoted by I and failure to reject the null hypothesis denoted by 0. Let 
riij denote the number of observations that belong to the outcomes in the ith row and 
the j th column of the L  and F  tests. The 2 x 2  table of the number of observations 
can be written as follows.
F
I 0 Total
I n n M l2 M i.
0 M2I M22 M2.
Total n . i n .2 N
Let Pij be the probability of an observation in the ith row and the j th column. Let a  
and a* be the sizes of the F  and L tests, respectively. Because the comparison test 
is exact (or nearly so), it is known that E(n.i) =  aN,  whereas the expectation of rii. 
is unknown. The likelihood function, conditional on n.i, can be written as
L(PK1) =  ( ^ n.2
nn
P22 
1 — a
n.2—ni2 P22 
I — a
ni2
5
where a = Pu + P21 and I — a = Pi2 +  T22.
113
Then, the loglikelihood function, conditional on n.i, is
=  nn  In ^ ^  +  (n.i -  nn) In ^ l  — +  (n.2 — ni2) Ir
+ n i2 In I PtI — a +  constants.
The MLE of Pu, Pu, can be obtained by solving
af(P|n.i) 0.
Therefore,
A r = n.i
Similarly, by solving =  0, the MLE of Pi2, Pi2 is
Pu  =  ^ ( 1  -  a). 
71.2
Accordingly, an estimator of a*, say a, is
a  =  Pu  +  Pi2 =  — a  +  — (I -  a), n.i n.2
The expectation and variance of a, conditional on n.i, are
E (oi|n.i) =  Pu +  P l2 =  a* and
var (a|n.i) =  - P u  ( I -  — ^ +  -— -P i2 I I -n.i n.2
Pr
I — CK
Accordingly,
P22
I  — CK
var ( ck) =  E [var (a |n .i)].
114
Because n.i ~  B(N, a) and n.2 ~  B(N, I — a), then using
E K 1)
E (n^1)
aN and
(I -  a)N
yields
var (a) «  01 01  ^ (l -  (p2) ,
where a* = P11 +  P12 and
1P n.xn.2ni.n2.
Let d =  %  be the usual estimator of a*. Then,
var (a) a* (I -  a*)N
The relative efficiency of a  to d is given by
var (d) _
var (a) (I — (p2)
Therefore, the effective sample size, N*, is
N
( i - y f ) '
All computations were executed using Matlab programs. Details including the 
programming code are given in Appendices. The test statistic T* in (1.18) was pro­
grammed using Schott’s (1991) expression for c. However, the value of c, using the 
data from Example I, is not identical to his result for the same data set and this
115
leads to different value of T*. The program was checked thoroughly and no errors 
were found. Professor Schott was asked about the discrepancy and he responded that 
he have been unable to locate his code and thus this thesis should proceed under the 
assumption that the program used is correct.
The following tables report the results of simulations. A cell in each table presents 
the following quantities:
for a test with nominal a — 0.05.
/ '
116
Table 5. Probabilities of Type I error when (p, k,  q) =  (10, 2, 4) and A =  (lg, 10, 25)'.
n LRT BART F Tk,q T* rp**
20 0.0610 0.0440 0.0470 0.2690 0.0700 0.1170
0.0625 0.0452 0.2713 0.0728 0.1198
1263 1205 1154 2479 1592
30 0.0820 0.0540 0.0590 0.1780 0.0750 0.0970
0.0763 0.0497 0.1701 0.0662 0.0884
1420 1339 1407 4410 2402
50 0.0560 0.0480 0.0520 0.1100 0.0580 0.0620
0.0546 0.0468 0.1081 0.0560 0.0600
1895 1629 1797 9164 5877
75 0.0510 0.0430 0.0450 0.0860 0.0560 0.0590
0.0546 0.0465 0.0908 0.0609 0.0639
1829 2045 2003 4861 4024
Table 6. Probabilities of Type I error when (p, k, q) — (10,2,4) and A =  (lg, 5,10)'.
n LRT BART F Tk,q rj-i* rji**
20 0.0680 0.0410 0.0470 0.3050 0.0470 0.1190
0.0695 0.0421 0.3072 0.0491 0.1216
1201 1198 1126 1896 1494
30 0.0750 0.0490 0.0590 0.2050 0.0610 0.0990
0.0694 0.0441 0.1974 0.0535 0.0907
1450 1539 1321 3096 2120
50 0.0570 0.0460 0.0520 0.1170 0.0640 0.0730
0.0556 0.0447 0.1151 0.0621 0.0710
1863 1966 1706 4340 3295
75 0.0560 0.0490 0.0510 0.0960 0.0620 0.0630
0.0553 0.0483 0.0950 0.0611 0.0621
1837 1881 2024 3944 3756
117
Table 7. Probabilities of Type I error when (p, k,  q) =  (10,2,4) and A =  (1'8, 2,10)'.
n LRT BART F Tk,q T* n^**
20 0.1520 0.0460 0.0550 0.5460 0.0320 0.1920
0.1495 0.0441 0.5437 0.0311 0.1894
1108 1199 1046 1056 1103
30 0.1200 0.0490 0.0580 0.4140 0.0370 0.1530
0.1156 0.0458 0.4093 0.0351 0.1481
1185 1238 1083 1092 1183
50 0.0910 0.0520 0.0520 0.2630 0.0550 0.1080
0.0898 0.0511 0.2614 0.0541 0.1066
1289 1257 1181 1236 1306
75 0.0670 0.0420 0.0450 0.1760 0.0580 0.0770
0.0703 0.0447 0.1802 0.0610 0.0808
1411 1446 1264 1378 1530
Table 8. Probabilities of Type I error when (p, k, q) — (10,2,4) and A =  (1'8, 1.5,10)'.
n LRT BART F Tk,q P^*
20 0.1690 0.0390 0.0470 0.6550 0.0400 0.2530
0.1701 0.0395 0.6560 0.0401 0.2540
1048 1031 1023 1001 1027
30 0.1560 0.0490 0.0440 0.5680 0.0620 0.2200
0.1590 0.0513 0.5706 0.0630 0.2226
1087 1146 1032 1022 1048
50 0.1350 0.0450 0.0470 0.4500 0.0750 0.1860
0.1366 0.0457 0.4516 0.0758 0.1873
1119 1065 1054 1052 1057
75 0.1160 0.0450 0.0440 0.3860 0.0740 0.1400
0.1194 0.0474 0.3893 0.0762 0.1427
1152 1190 1056 1094 1075
118
Table 9. Probabilities of Type I error when (p, k,  q) — (1 0 ,2 ,1) and A =  (lg, 10,25)'
n LRT BART F Tk,q T* rji**
20 0.0660 0.0480 0.0470 0.0870 0.0600 0.0650
0.0681 0.0498 0.0899 0.0630 0.0679
1528 1537 2072 4398 3441
30 0.0640 0.0530 0.0480 0.0760 0.0540 0.0560
0.0655 0.0545 0.0779 0.0560 0.0580
1804 1953 2584 8568 6664
50 0.0530 0.0500 0.0510 0.0680 0.0560 0.0560
0.0522 0.0492 0.0670 0.0550 0.0550
2340 2369 3796 7614 7614
75 0.0520 0.0500 0.0500 0.0580 0.0520 0.0520
0.0520 0.0500 0.0580 0.0520 0.0520
2967 3241 6887 24700 24700
Table 10. Probabilities of Type I error when (p, &, q) = (10, 2 ,1) and A =  (lg, 5,10)'
n LRT BART F Tk,q
20 0.0590 0.0400 0.0530 0.0860 0.0580 0.0620
0.0575 0.0388 0.0833 0.0555 0.0595
1316 1258 2098 2785 2476
30 0.0650 0.0550 0.0570 0.0700 0.0540 0.0540
0.0598 0.0502 0.0635 0.0480 0.0480
1988 1947 3505 4310 4310
50 0.0560 0.0540 0.0490 0.0650 0.0580 0.0580
0.0568 0.0548 0.0660 0.0590 0.0590
2670 2617 3863 5024 5024
75 0.0560 0.0490 0.0520 0.0580 0.0560 0.0560
0.0542 0.0474 0.0561 0.0541 0.0541
3935 2806 6890 6704 6704
119
Table 11. Probabilities of Type I error when (p, k,  q) =  (10, 2 ,1) and A =  (lg, 2,10)'
n LRT BART F Tk,q T* rp**
20 0.0970 0.0560 0.0530 0.1250 0.0730 0.0790
0.0960 0.0552 0.1231 0.0717 0.0775
1069 1080 1236 1168 1202
30 0.0890 0.0580 0.0470 0.1020 0.0660 0.0710
0.0909 0.0594 0.1044 0.0675 0.0727
1270 1226 1432 1234 1286
50 0.0790 0.0540 0.0560 0.0780 0.0710 0.0720
0.0751 0.0508 0.0727 0.0665 0.0674
1451 1408 2336 1849 1905
75 0.0570 0.0480 0.0510 0.0630 0.0560 0.0580
0.0563 0.0473 0.0622 0.0553 0.0572
2003 1919 2393 2043 2208
Table 12. Probabilities of Type I error when (p, k, q) = (10, 2 ,1) and A =  (lg, 1.5,10)'
n LRT BART F Tk,q T*
20 0.1210 0.0710 0.0520 0.1520 0.0850 0.0940
0.1204 0.0705 0.1509 0.0844 0.0934
1049 1040 1130 1050 1057
30 0.1220 0.0700 0.0530 0.1270 0.0880 0.1050
0.1205 0.0686 0.1252 0.0870 0.1038
1138 1199 1197 1071 1087
50 0.1080 0.0740 0.0490 0.1220 0.0920 0.089
0.1086 0.0745 0.1226 0.0924 0.0894
1224 1206 1203 1085 1101
75 0.0910 0.0640 0.0400 0.0920 0.0670 0.0660
0.0971 0.0698 0.0989 0.0728 0.0716
1207 1280 1275 1261 1238
120
Table 13. Probabilities of Type I error when ( p , k , q )  =  (10,4,2) and A
(1^,10,10,10,25)'.
n LET BART F Tk,q T*
20 0.1350 0.0610 0.0480 0.2910 0.0700 0.1240
0.1362 0.0620 0.2925 0.0718 0.1258
1151 1228 1140 2397 1553
30 0.0870 0.0530 0.0430 0.1850 0.0550 0.0750
0.0920 0.0570 0.1910 0.0614 0.0818
1354 1374 1246 2960 2243
50 0.0710 0.0400 0.0390 0.1150 0.0500 0.0580
0.0787 0.0460 0.1251 0.0606 0.0688
1385 1409 1454 3703 2933
75 0.0640 0.0450 0.0400 0.0920 0.0490 0.0510
0.0717 0.0521 0.1015 0.0586 0.0609
1604 1812 1698 4286 4450
Table 14. Probabilities of Type I error when (p,k,q) =  (10,4,2) and A 
( I ',  5,5,10,10)'.
n LRT BART F Tk tq T* rj-***
20 0.1620 0.0630 0.0480 0.3380 0.0620 0.1400
0.1631 0.0639 0.3394 0.0635 0.1418
1124 1173 1109 1857 1418
30 0.0950 0.0490 0.0430 0.2130 0.0510 0.0810
0.0999 0.0522 0.2188 0.0566 0.0870
1309 1228 1199 2175 1700
50 0.0710 0.0460 0.0390 0.1290 0.0540 0.0670
0.0784 0.0519 0.1390 0.0639 0.0774
1346 1330 1377 2500 2148
75 0.0650 0.0490 0.0400 0.1000 0.0510 0.0530
0.0727 0.0558 0.1094 0.0606 0.0626
1588 1610 1600 3773 3398
121
Table 15. Probabilities of Type I error when (p, k,  q) =  (10,4,2) and A
(T6, 2,5,10,10)'.
n LRT BART F Tk,q T*
20 0.2140 0.0650 0.0440 0.4570 0.0820 0.1810
0.2171 0.0670 0.4603 0.0842 0.1844
1070 1084 1052 1080 1102
30 0.1340 0.0550 0.0460 0.3230 0.0630 0.1170
0.1365 0.0568 0.3258 0.0648 0.1199
1179 1199 1112 1184 1282
50 0.0820 0.0430 0.0390 0.1870 0.0580 0.0890
0.0884 0.0478 0.1963 0.0644 0.0965
1202 1208 1214 1300 1273
75 0.0850 0.0590 0.0500 0.1520 0.0700 0.0840
0.0850 0.0590 0.1520 0.0700 0.0840
1344 1520 1415 2018 1764
Probabilities of Type I error when (p, k,q) = (10,4,
10)'.
n LET BART F Tk,q T* ji**
20 0.2320 0.0650 0.0470 0.5070 0.0760 0.1980
0.2336 0.0661 0.5085 0.0769 0.1994
1078 1103 1045 1060 1064
30 0.1610 0.0640 0.0450 0.3940 0.0890 0.1570
0.1642 0.0661 0.3972 0.0911 0.1599
1153 1146 1078 1102 1122
50 0.1260 0.0590 0.0630 0.3280 0.1070 0.1420
0.1164 0.0517 0.3192 0.1020 0.1366
1425 1510 1151 1106 1095
75 0.0940 0.0500 0.0400 0.2190 0.0750 0.1070
0.1011 0.0547 0.2269 0.0786 0.1124
1293 1216 1161 1079 1132
122
Table 17. Probabilities of Type I error when (p, k,  q) =  (10,4,5) and A
(T6, 10,10,10,25)'.
n LRT BART F Tk,q T* n^**
20 0.1560 0.0740 0.0800 0.7100 0.0840 0.3190
0.1370 0.0593 0.7005 0.0590 0.2968
1287 1347 1036 2968 1227
30 0.1120 0.0600 0.0650 0.4460 0.0890 0.1770
0.1019 0.0518 0.4371 0.0746 0.1638
1378 1469 1094 3202 1477
50 0.0750 0.0550 0.0500 0.2470 0.0660 0.0890
0.0750 0.0550 0.2470 0.0660 0.0890
1605 1905 1191 3918 2167
75 0.0660 0.0510 0.0430 0.1750 0.0570 0.0700
0.0720 0.0568 0.1810 0.0639 0.0768
1950 2346 1268 3896 2481
Table 18. Probabilities of Type I error when (p,k,q) =  (10,4,5) and A 
(%, 5,5,10,10)'.
n LRT BART F Tk,q T* rji**
20 0.1560 0.0460 0.0710 0.7980 0.0410 0.3500
0.1433 0.0397 0.7934 0.0333 0.3356
1224 1155 1019 1288 1157
30 0.1170 0.0560 0.0640 0.5180 0.0670 0.1880
0.1078 0.0489 0.5108 0.0568 0.1759
1337 1412 1067 2041 1419
50 0.0760 0.0500 0.0500 0.2890 0.0650 0.0980
0.0760 0.0500 0.2890 0.0650 0.0980
1591 1701 1148 2967 1939
75 0.0620 0.0480 0.0440 0.2070 0.0590 0.0730
0.0670 0.0525 0.2120 0.0649 0.0788
2034 2123 1214 3760 2406
123
Table 19. Probabilities of Type I error when ( p , k , q )  — (10,4,5) and A
( I z6, 2 ,5 ,10 ,10 )'.
n LRT BART F Tk,q T* J - I * *
20 0.1800 0.0340 0.0790 0.9010 0.0460 0.4900
0.1637 0.0271 0.8979 0.0387 0.4763
1183 1143 1009 1118 1069
30 0.1340 0.0430 0.0640 0.7620 0.0730 0.3220
0.1243 0.0385 0.7584 0.0680 0.3123
1327 1176 1021 1126 1150
50 0.0930 0.0460 0.0500 0.4940 0.0740 0.1590
0.0930 0.0460 0.4940 0.0740 0.1590
1492 1413 1057 1430 1295
75 0.0690 0.0470 0.0460 0.3230 0.0780 0.1200
0.0724 0.0500 0.3258 0.0810 0.1233
1927 2215 1112 1547 1402
Probabilities of Type I error when (p, k, q) = (10,4,
10)'.
n LRT BART F Tk,q T* r j p * *
20 0.2020 0.0250 0.0630 0.9450 0.0560 0.5740
0.1936 0.0220 0.9442 0.0541 0.5694
1182 1143 1003 1023 1031
30 0.1620 0.0260 0.0680 0.8490 0.0990 0.4540
0.1515 0.0214 0.8461 0.0955 0.4454
1189 1196 1013 1027 1061
50 0.1080 0.0220 0.0490 0.7180 0.1260 0.3170
0.1087 0.0223 0.7183 0.1264 0.3175
1338 1285 1020 1066 1057
75 0.1040 0.0420 0.0520 0.5620 0.1330 0.2540
0.1022 0.0408 0.5611 0.1321 0.2527
1763 1720 1044 1103 1121
124
Table 21. Probabilities of Type I error when (p, k,  q) =  (15 ,2 ,1) and A =  (Iz13, 15,35)'
n LRT BART F Tk,q T* n^**
25 0.0610 0.0470 0.0580 0.0650 0.0450 0.0470
0.0606 0.0468 0.0647 0.0455 0.0465
1001 1000 1001 1004 1004
35 0.0490 0.0390 0.0520 0.0590 0.0410 0.0420
0.0489 0.0389 0.0580 0.0409 0.0419
1002 1004 1001 1001 1001
50 0.0380 0.0310 0.0460 0.0500 0.0370 0.0370
0.0381 0.0311 0.0503 0.0373 0.0373
1001 1000 1006 1007 1007
75 0.0400 0.0370 0.0560 0.0410 0.0380 0.0380
0.0396 0.0366 0.0405 0.0374 0.0374
1007 1008 1010 1012 1012
Table 22. Probabilities of Type I error when (p, k, q) =  (15,2,1) and A =  (1'13, 10,15)'
n LRT BART F Tk,q T* J-***
25 0.0620 0.0460 0.0580 0.0610 0.0480 0.0490
0.0616 0.0458 0.0606 0.0475 0.0485
1001 1000 1001 1004 1003
35 0.0470 0.0380 0.0520 0.0580 0.0440 0.0450
0.0469 0.0370 0.0579 0.0439 0.0449
1003 1005 1001 1001 1001
50 0.0370 0.0310 0.0460 0.0480 0.0380 0.0380
0.0461 0.0394 0.0618 0.0520 0.0520
1001 1001 1004 1006 1006
75 0.0560 0.0480 0.0460 0.0680 0.0600 0.0600
0.0563 0.0482 0.0684 0.0602 0.0602
1005 1001 1005 1002 1002
125
Table 23. Probabilities of Type I error when (p, k,  q) — (15, 2 ,1) and A =  (Iz13, 2,15)'.
(1'13,1.5,15)'.
n LRT BART F Tk,q T* J-***
25 0.1000 0.0740 0.0580 0.0830 0.0630 0.0650
0.0995 0.0735 0.0822 0.0625 0.0644
1002 1003 1006 1003 1005
35 0.0800 0.0570 0.0520 0.0720 0.0540 0.0570
0.0798 0.0568 0.0719 0.0539 0.0569
1009 1013 1000 1001 1003
50 0.0900 0.0650 0.0450 0.0670 0.0640 0.0660
0.0901 0.0650 0.0668 0.0638 0.0658
1000 1000 1001 1001 1001
75 0.0660 0.0550 0.0560 0.0470 0.0400 0.0410
0.0655 0.0548 0.0466 0.0397 0.0407
1005 1001 1004 1003 1003
Probabilities of Type I error when (p, k, q) = (15,2,
n LRT BART F Tk,q rj-***
25 0.1230 0.0850 0.0580 0.0890 0.0650 0.0710
0.1221 0.0844 0.0881 0.0645 0.0701
1005 1003 1007 1003 1009
35 0.1000 0.0620 0.0520 0.0910 0.0730 0.0790
0.0998 0.0618 0.0909 0.0728 0.0788
1007 1008 1002 1005 1004
50 0.1060 0.0780 0.0530 0.0860 0.0860 0.0820
0.1059 0.0778 0.0859 0.0860 0.0820
1001 1002 1001 1000 1000
75 0.0930 0.0640 0.0560 0.0740 0.0620 0.0700
0.0923 0.0635 0.0736 0.0616 0.0694
1007 1006 1004 1004 1007
126
Table 25. Probabilities of Type I error when ( p , k , q )  =  (15,4,2) and A
(I ' , ,  15,15,15,35)'.
n LRT BART F Tk,q T* J-***
25 0.1360 0.0700 0.1520 0.2080 0.0690 0.0930
0.1247 0.0626 0.1919 0.0623 0.0820
1013 1010 1020 1008 1018
35 0.1050 0.0640 0.0830 0.1440 0.0710 0.0860
0.1005 0.0602 0.1379 0.0675 0.0809
1015 1016 1021 1013 1024
50 0.0740 0.0440 0.0660 0.0113 0.0580 0.0620
0.0722 0.0419 0.0110 0.0564 0.0602
1012 1025 1009 1011 1013
75 0.0770 0.0630 0.0640 0.0910 0.0610 0.0620
0.0744 0.0604 0.0882 0.0586 0.0597
1024 1035 1030 1030 1029
Table 26. Probabilities of Type I error when (p,k,q) =  (15,4,2) and A 
(T11, 10,10,15,15)'.
n LRT BART F Tk,q T* rji**
25 0.1350 0.0710 0.1520 0.2100 0.0710 0.0940
0.1243 0.0629 0.1933 0.0637 0.0839
1012 1012 1021 1010 1014
35 0.1060 0.0680 0.0830 0.1530 0.0710 0.0820
0.1016 0.0639 0.1468 0.0671 0.0776
1014 1018 1021 1016 1018
50 0.0750 0.0440 0.0660 0.1120 0.0580 0.0630
0.0729 0.0419 0.1100 0.0564 0.0612
1015 1025 1009 1011 1013
75 0.0740 0.0620 0.0640 0.0880 0.0610 0.0610
0.0714 0.0594 0.0853 0.0586 0.0586
1031 1036 1027 1030 1030
127
Table 27. Probabilities of Type I error when ( p , k , q )  =  (15,4,2) and A
(In , 2,10,15,15)'.
n LRT BART F Tk,q T* rji**
25 0.1970 0.0990 0.1520 0.2950 0.0550 0.1110
0.1843 0.0880 0.2830 0.0490 0.1014
1012 1017 1008 1008 1011
35 0.1450 0.0880 0.0830 0.2320 0.0540 0.0820
0.1407 0.0851 0.2273 0.0520 0.0784
1010 1007 1008 1005 1011
50 0.1030 0.0590 0.0660 0.1650 0.0510 0.0740
0.1009 0.0572 0.1619 0.0493 0.0724
1012 1015 1071 1015 1008
75 0.0960 0.0600 0.0640 0.1240 0.0650 0.0690
0.0937 0.0580 0.1209 0.0636 0.0672
1019 1020 1026 1009 1015
Probabilities of Type I error when (p,k,q) = (15,4,
5,15)'.
n LRT BART F Tk,q T*
25 0.2290 0.1160 0.1520 0.3460 0.0650 0.1300
0.2162 0.1054 0.3386 0.0617 0.1227
1011 1013 1003 1002 1005
35 0.1790 0.0870 0.0830 0.2970 0.0660 0.1080
0.1746 0.0854 0.2912 0.0653 0.1054
1009 1002 1011 1000 1005
50 0.1230 0.0700 0.0660 0.2340 0.0760 0.0910
0.1210 0.0683 0.2323 0.0752 0.0894
1009 1010 1003 1002 1007
75 0.1240 0.0780 0.0640 0.1940 0.0750 0.0980
0.1216 0.0761 0.1911 0.0743 0.0967
1015 1015 1017 1002 1006
128
The following tables report power of the tests when p =  10, k — 2 ,4, and q = 1,2. 
Figues compare the power of the BART  and T* tests both of which are controlled 
for Type I error.
Figure 3. Power of test when A =  (lg, 10,25)', k = 2, q = I.
BART 
- 0 -  Tstar
Table 29. Power of test when A =  (1'8, 10,25)', k — 2, q — l.
Power of test
n LRT BART Tk,q rji*
25 0.3330 0.2980 0.3550 0.2880 0.2960
50 0.5890 0.5570 0.5720 0.5390 0.5410
75 0.7480 0.7410 0.7540 0.7290 0.7300
100 0.8580 0.8510 0.8540 0.8450 0.8450
150 0.9620 0.9620 0.9640 0.9640 0.9640
129
Figure 4. Power of test when A =  (Ix8, 2,10)', k  — 2, q =  I.
BART 
-Q - Tslar
Table 30. Power of test when A =  (18, 2,10)', k — 2, q — I.
Power of test
n LRT BART Tk,q rji**
25 0.3150 0.2410 0.3200 0.1710 0.2000
50 0.4580 0.4050 0.4620 0.3230 0.3640
75 0.5960 0.5580 0.5870 0.4940 0.5180
100 0.6930 0.6540 0.6610 0.6030 0.6240
150 0.8750 0.8600 0.8660 0.8490 0.8530
130
Figure 5. Power of test when A =  (lg, 10,10,10,25)', A; =  4, g =  2.
BART
Tstar
Table 31. Power of test when A =  (1'6, 10,10,10,25)', k = A, q = 2.
Power of test
n LRT BART Tk,q J-***
25 0.2750 0.1690 0.4130 0.1680 0.2330
50 0.3420 0.2700 0.4150 0.2710 0.2940
75 0.4990 0.4360 0.5370 0.4470 0.4600
100 0.6130 0.5820 0.6360 0.5780 0.5800
150 0.8230 0.8070 0.8230 0.8010 0.8040
131
Figure 6. Power of test when A =  (lg, 2, 5,10,10)', =  4, q =  2.
Q- 0.4
BART 
-Q - Tstar
Table 32. Power of test when A =  (lg, 2,5,10,10)', k — 4, q = 2.
Power of test
n LET BART Tk,q rji* rJi**
50 0.2900 0.1940 0.4070 0.1680 0.2200
75 0.3770 0.3010 0.4570 0.2550 0.2870
100 0.4840 0.4300 0.5320 0.3900 0.4160
150 0.6240 0.5840 0.6510 0.5590 0.5710
175 0.7020 0.6720 0.7160 0.6620 0.6670
200 0.7890 0.7650 0.8000 0.7530 0.7550
132
Discussion
The results of the simulation study of variable reduction test reported in Table 5 
- Table 28 show that the likelihood ratio test (LRT) can be improved using a Bartlett 
correction, especially when is quite small (see Tables 7, 8, 11, 12, 15,
16, 19, 20, 23, 24, 27, and 28). In these cases, the LRT  becomes liberal whereas 
the BART  is still conservative. With a few exceptions in Tables 8 and 19-22, the 
BART  gives probabilities of Type I error between 0.04 to 0.10. In general, the LRT  
performs better than Tktqi especially when k + q is large, say greater than or equal to 
|p . In these cases, the LRT  continues to perform well whereas Tktq becomes liberal 
(see Tables 17-20). The Tktq test gives probabilities of Type I error up to 0.945 (see 
Table 20). The T** test does not perform nearly as well as T* and BART  unless 
k + q is small (see Tables 9-12 and 21-24) or the sample size is quite large. Under 
most conditions, the BART  results are better than T*. The effective sample size, 
N*, is larger as the sample size is larger. The comparison test is useful when q is not 
far from m = p — k. If q is far from m, N* is close to N  and the relative efficiency is 
almost I (see Tables 21-28).
With respect to the power of test, it is sufficient to compare the BART  with 
the T* both of which are controlled for Type I error. The results are reported in 
Figure 3 - Figure 6. Only a few cases were performed. These figures show that the 
Bartlett-corrected test has a better power than T*, especially when is
small (see Figures 4 and 6).
133
Simulation Study of Component Reduction Tests
The evaluation is based on the percentage coverage of confidence interval for 
under different conditions. The intervals examined are 
o Likelihood ratio interval (LRI) 
o Bartlett-corrected interval (BARI) 
o Anderson’s interval (ZI) by inverting from (1.6).
To obtain LR I  and BAR I, the confidence interval of can be constructed by 
inverting the LRT  and BART  of H0: =  C0 against H0: ^  C0. The interval
consists of all values C0 for which H0: =  Co cannot be rejected. That is, a
(I — ck) 100% Cl for can be obtained as
where Q is the LRT  or BART  statistic.
In this simulation study, the LR I  and BAR I  are obtained by applying the fmin- 
search command in Matlab program (see Matlab subprogram in Appendices). To 
minimize the quantity (p-value — a)2, this subprogram finds a local minimizer cq. 
The interval (c0i, C02) is found such that the p-values for testing Ho: =  cqi and
Ho: =  C02 are each equal to a. The endpoints of Anderson’s intervals are used as
initial guesses.
The number of original variables, p, was 8. The number of retained components,
k, was 2 and 4. For each simulation condition, 1000 data sets were generated. For
134
simplicity, the covariance matrices examined were diagonal and have Ai =  • • • =  
\ p-k =  I and values for the remaining eigenvalue are listed in the tables below. The 
vector Ci was chosen so that are lIiT-L, \  The nominal test size used was 0.05. 
Tables with the percentage coverage of confidence interval for are shown below.
Table 33. Percentage coverage of Cl when (p, k) = (8,2) and A =  (Iz6, 10,25)'.
n LR I BAR I Z I
20 90.70 92.40 88.90
75 94.10 94.90 93.90
Table 34. Percentage coverage of Cl when (p, k) =  (8,2) and A =  (1'6, 2,10)'.
n LR I BAR I Z I
20 84.70 90.10 84.90
75 92.30 93.50 92.60
Table 35. Percentage coverage of Cl when (p, k) =  (8,4) and A =  (Iz4, 10,10,10,25)z.
n LR I B AR I Z I
20 78.80 93.90 72.50
75 90.40 94.90 89.90
Table 36. Percentage coverage of Cl when (p, k) =  (8,4) and A =  (14, 2,10,10,10)z.
n LR I BAR I Z I
20 62.30 91.00 60.10
75 87.40 93.60 87.40
The results of the simulation of component reduction reported in Table 33 -
Table 36 suggest that the Bartlett-corrected intervals perform better than the other 
intervals. The percentage coverage of Cl for of the Bartlett-corrected intervals
135
are closer to 95% than that of the likelihood ratio intervals and Anderson’s intervals 
in all conditions.
Numerical Examples
Example I. (continued)
(The study of the quality of pictures) In Chapter I, the following two questions 
were addressed.
1. How many components should be retained?
2. How many and which variables can be considered as redundant?
To answer these questions, a confidence interval for was computed, where Ci =  
(0 0 0 0 0 0 0 1 I)'. That is the Cl for the cumulative proportion of total variance 
accounted for by the first two components. The 95% confidence interval is (0.7422, 
0.7690). It can be concluded that the first two components are sufficient to explain 
the variability.
The focus, now, is on the first two components. From Table 2, the first two 
components have small coefficients on the last three variables. The question is whether 
these three variables can be eliminated. Consider hypothesis (1.21), where k = 2, q = 
3 and n =  108. That is, the last three variables have zero coefficients in each of the 
first two components. The likelihood ratio test statistic is 3.36 which asymptotically 
follows a and the corresponding p-value is 0.7613. The hypothesis could not be 
rejected at either the 0.01 or 0.05 significant levels. Then, these three variables are
136
redundant and can be discarded. The Bartlett-corrected test statistic is 3.41 and the 
corresponding p-value is 0.7549. This leads to the same conclusion.
Example 2. (continued)
(Women’s track records) In Chapter I, two questions were addressed: how many 
components should be retained and which polynomials contribute to the largest com­
ponent? To answer these questions, the confidence interval for was computed, 
where Ci =  (0 0 0 0 I I)'. The 95% Cl for the cumulative proportion of total variance 
accounted for bp the first two components is (0.7823, 0.8282). Therefore, the first 
two components should be retained.
To reduce the dimension in the number of variables, hypothesis (1.21) was exam­
ined, where k = 1, 2 and q = I, 2, 3, 4, 5. That is, the first k components have zero 
loadings on the last q variables.
Table 37. P-value of Testing Hypothesis: Women’s Track Example.
k Q LRT p-value BART p-value
I I 0.5877 0.4433 0.6082 0.4355
I 2 2.5382 0.2811 2.5549 0.2787
I 3 8.7912 0.0322* 8.5538 0.0359*
I 4 9.7598 0.0447* 9.3310 0.0533
I 5 14.8112 0.0112* 13.9096 0.0162*
2 I 0.6040 0.7393 0.5991 0.7411
2 2 8.5104 0.0746 8.2538 0.0827
2 3 31.3926 <0.0001** 30.1616 <0.0001**
Note:* denote rejection of Ho at 0.05 
** denote rejection of H0 at 0.01
For each of the hypotheses, the likelihood ratio test statistics, Bartlett-corrected 
test statistics which asymptotically follow a and the corresponding p-values are 
presented in Table 37. Failure to reject H0 means that, in the first k components, all 
variability in speed over track events is related to the first p — q polynomials. The
137
last q polynomials are redundant.
138
CHAPTER 6 
CONCLUSION
In this thesis, new procedures for examining dimension reduction in principal 
component analysis were developed in Chapters 2, 3 and 4. The Bartlett-corrected 
test of redundancy appears to be promising. The tabled results show an improvement 
in Type I error control for the Bartlett-corrected test when compared to the likelihood 
ratio test and other competing tests. Figures 3-6 show that the Bartlett-corrected 
test has as good or better power than the competing test in all conditions that were 
examined.
The Bartlett-corrected test also appears to be promising in respect to the percent­
age coverage of confidence interval for Results show that the Bartlett-corrected 
intervals substantially improve the control over the percentage coverage of Cl for 
in these conditions.
There are some limitations of this thesis. First, the procedures for estimating 
MLEs, for constructing LRTs, and for evaluating Bartlett corrections are derived un­
der normality. The results do not apply to non-normal data. Second, the proposed 
model is based on the covariance matrix. The model does not apply to the correlation 
matrix. A limitation of the procedures is computational intensity. The computation 
of MLEs and the construction of LRTs are fast, but to compute the Bartlett cor­
rection, the procedure is computationally intensive. Computer time increases as p
139
increases. This may not be noticeable when computing on just one data set. How­
ever, a simulation study may take a long time. Efficient programming is needed to 
shorten the computer time.
For future research, it would be useful to construct one-sided confidence intervals 
for Deriving Edgeworth approximations and saddlepoint approximations for the 
distribution of estimators of eigenvectors and eigenvalues also could be an interesting 
topic. To shorten the computer time, algebraic simplification of Bartlett equations 
could be done. Under non-normality, adopting robust methods such as M estimators 
could be an important extension. Another interesting topic would be to examine 
dimension reduction in a correlation matrix.
Finally, it would be useful to develop guidelines for a general set of methods for 
exploratory and confirmatory principal component analysis. Especially, in confirma­
tory PCA consisting of several types of hypothesis testings, the future research could 
be driven by the demands of applications. This present research is one part of the 
confirmatory PCA.
140
REFERENCES
[1] Anderson, T.W. (1963). Asymptotic theory for principal component analysis.
Annals of Mathematical Statistics, 10, 203-224.
[2] Anderson, T.W. (1984). Introduction to Multivariate Statistical Analysis, Second
edition. New York: Wiley.
[3] Bartlett, M.S. (1954). A note on the multiplying factors for various %2 approxi­
mations. Journal of the Royal Statistical Society, Series B, 16, 296-298.
[4] Beale, E.M.L, Kendall, M.G. and Mann, D.W. (1967). The discarding of variables
in multivariate analysis. Biometrika, 54, 357-366.
[5] Bentler, P.M. and Yuan, K-H. (1996). Test of linear trend in eigenvalues of a
covariance matrix. The British Journal of Mathematical and Statistical Psy­
chology, 49, 299-312.
[6] Boik, R.J. (1998). A local parameterization of orthogonal and semi-orthogonal
matrices with applications. J. Mult. Anal, 67, 244-276.
[7] Boik, R.J. (2002a). Spectral models for covariance matrices. Biometrika, 89,
159-182.
[8] Boik, R.J. (2002b). Principal components models for correlation matrices, in
press.
[9] Box, G.E.P. (1949). A general distribution theory for a class of likelihood criteria.
Biometrika, 36, 317-346.
[10] Cattell, R.B. (1966). The scree test for the number of factors. Journal of Multi­
variate Behav. Res., I, 245-276.
[11] Cliff, N. (1966). Orthogonal rotation to congruence. Psychometrika, 31, 33-42.
[12] Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics. Chapman and Hall.
[13] Dawkins, B. (1989). Multivariate analysis of national track records. The Ameri­
can Statistician, 43, 110-115.
141
[14] Diimbgen L. (1995). Likelihood ratio tests for principal components. Journal of
Multivariate Analysis, 52, 245-258.
[15] Flury B.N. (1986). An asymptotic test for redundancy of variables in the compar­
ison of two covariance matrices. Statistics and Probability Letters, 4, 123-126.
[16] Flury B.N. (1988). Common Principal Components and Related Multivariate
Models. New York: Wiley.
[17] Fujikoshi, Y. (1985). Selection of variables in discriminant analysis and canonical
correlation analysis. Multivariate Analysis-VI, P.K. Krishnaiah, ed., Elsevier, 
219-236.
[18] Fujikoshi, Y. (1989). Tests for redundancy of some variables in multivariate anal­
ysis. Statistical data analysis and inference, New York: Elsevier.
[19] Fujikoshi, Y. and Rao, C.R. (1991). Selection of covariance in the growth curve
model. Biometrika, 78, Issue 4, 779-785.
[20] Gleason, T.C. (1976). On redundancy in canonical analysis. Psychological, 83,
No.6, 1004-1006.
[21] Huang, D. and Tseng, S. (1992). A decision procedure for determining the num­
ber of components in principal component analysis. Journal of Statistical 
Planning and Inference, 30,63-71.
[22] Jackson, J.E. and R.H. Morris (1957). An application of multivariate quality
control to photographic processing. Journal Amer. Statist. Assoc, 52,186- 
199.
[23] JolliTe, I.T. (1972). Discarding variables in a principal component analysis, I:
Artificial data. Appl Statist., 21, 160-173.
[24] JolliTe, I.T. (1973). Discarding variables in a principal component analysis, II:
Real data. Appl Statist., 22, 22-31.
[25] JolliTe, I.T. (1986). Principal Component Analysis. New York: Springer.
[26] JolliTe, I.T. and Cadima J. (1995). Loadings and correlations in the interpreta­
tion of principal components. Journal of Applied Statistics, 22, No.2, 203-214.
[27] Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis.
Psychometrika, 23, 187-200.
142
[28] Lawley, D.N. (1956). Tests of significance for the latent roots of covariance and
correlation matrices. Biometrika, 43, 128-136.
[29] Lazraq, A. (2001). Statistical inference concerning several redundancy indices.
Journal of Multivariate Analysis, 79, 71-88.
[30] Magnus, J.R. and Neudecker, H. (1979). The commutation matrix: some prop­
erties and applications. Ann. Statist., 7 ,  381-394.
[31] Magnus, J.R. and Neudecker, H. (1999). Matrix Differential Calculus with Ap­
plications in Statistics and Econometrics, Revised Ed. Chichester: Wiley.
[32] Mathworks, Inc. (2000). Matlab Version 6.1. Mathworks, Inc.
[33] McCabe, G.P. (1984). Principal variables. Technometrics, 26, 137-144.
[34] Naik, D.N. and Khattree, R. (1996). Revisiting Olympic track records: Some
practical considerations in the principal component analysis. The American 
Statistician, 50, 140-144.
[35] Ouellette, D.V. (1981). Schur complements and statistics. Linear Algebra and its
Applications, 36, 187-295.
[36] Rencher, A.C. (1995). Methods of Multivariate Analysis. New York: Wiley.
[37] Schott, J.R. (1991). Testing for the redundancy of variables in principal compo­
nents analysis. Statistics and Probability Letters, 11, 495-501.
[38] Seber, G.A.F. (1984). Multivariate Observations. New York: Wiley.
[39] Sen, P.K. and Singer, J.M. (1974). Large Sample Methods in Statistics: An in­
troduction with applications. Chapman and Hall.
[40] Severini, T.A. (2000). Likelihood Methods in Statistics. Oxford University Press.
[41] Sugiyama, T. and Tong, H. (1976). On a statistic useful in dimensionality reduc­
tion in multivariate linear stochastic system. Commun. Statist., AS, 711-721.
[42] Taylor, A.E. and Mann, W.R. (1983). Advanced Calculus. New York: Wiley.
[43] Tyler, D.E. (1981). Asymptotic inference for eigenvectors, The Annals of Statis­
tic, 9, 725-736.
143
[44] Van Den WolIenberg, A.L. (1977). Redundancy analysis an alternative for canon­
ical correlation analysis. Psychometrika, 42, No.2, 207-219.
144
APPENDICES
145
APPENDIX A
Notation
146
Notation
diag(o)
det
tr
exp
dim
O(P)
vec(A)
dvec(o,p,p)
©
©
%(T)
Wp(m,2)
H-)
£(■)
IIXII2
L
IP
e?
I(p,p)
D1-S
D f
Dp
N p
Ie
Ie
V'W
diagonal matrix having diagonal elements as a vector a
determinant
trace
exponential function 
dimension
group of orthogonal p x p matrices
stacking the columns of A  into a single column vector
converging a vector o to a p x p matrix
direct or Kronecker product
direct sum
the operation is to be performed element-wise 
subspace spanned by the columns of a matrix T  
p-variate normal distribution with mean /Lt and covariance 
matrix S
p x p matrix variate Wishart distribution with n  degrees of
freedom and covariance matrix S
likelihood function
loglikelihood function
norm of X  i.e YTi=I X?
the matrix p2 x p of YT=I (ej ® ej)
the p x p identity matrix
the j th column of I p
the p2 x p2 commutation matrix
the ith derivative of vecG with respect to /V evaluated at p  =  O
the ith derivative of A with respect to ip1
the duplication matrix
the matrix ( lp2 +  I(p,p)) /2
Fisher’s information
the average Fisher’s information
digamma function
147
APPENDIX B
List of Matlab Subprograms
148
The following list describes the Matlab subprograms used in the simulation.
1. commute.m : Computes the permutation matrix, I(a,6)-
2. dup.m : Computes the duplication matrix, Dv.
3. kron.m : Computes Kronecker products of three or more input matrices.
4. makeJC.m : Constructs the elementary matrix, E i v^.
5. oplus.m : Computes the direct sum of two or more matrices.
6. vec.m : Stacks the columns of a matrix into a single column vector.
7. dvec.m : Converges a column vector into a matrix.
8. Pvechv2.m : Constructs the matrices of Al, A2, L, and {Ip2 —
9. solve_eta.m : Given parameter fi, solves for the implicit parameter r] under 
model (2.31).
10. solve_etal.m : Given parameter /i, solves for the implicit parameter 77 under 
model (2.33).
11. solve_eta2.m : Given parameter /n, solves for the implicit parameter r}2 under 
model (2.66).
12. solve_tau.m : Given parameter tp, solves for the implicit parameter r  and com- 
■ putes £ under model (2.11).
149
13. find-guess.m : Computes initial guesses for T0 and A0.
14. fisher_scoringl.m : Updates the parameters [i and tp under the parameterization 
of A in (2.4).
15. fisher_scoring2.m : Updates the parameters fj, and cp under the parameterization 
of A in (2.9).
16. find_Vstar.m : Computes V* =  ( V 3 V4) ensured that the matrix of derivatives 
in (2.35) is nonsingular. This subprogram is used for estimating parameters.
17. find_Vstar_bart.m : Computes V* = (V3 V4) by using the row reduced echelon 
form to increase the number of zero entries in the matrices. This subprogram 
is used to compute a Bartlett correction to help reduce memory and run faster.
18. solve_mle.m : Computes the maximum likelihood estimates of T, A, and X. 
under model (2.31) and the parameterization of A in (2.4).
19. solve_mlel.m : Computes the maximum likelihood estimates of T, A, and S  
under model (2.33) and the parameterization of A in (2.4).
20. solve_mle2.m : Computes the maximum likelihood estimates of F, A, and X 
under model (2.31) and the parameterization of A in (2.9).
21. Bartlett.m : Computes E ^D(@; 6) j^ under model (2.31) and the parameteriza­
tion of A in (2.4).
150
22. Bartlettl.m  : Computes E yD(0] 0)j  under model (2.33) and the parameteri­
zation of A in (2.4).
23. Bartlett2.m : Computes E ^D(0; 0) j under model (2.31) and the parameteri­
zation of A in (2.9).
24. Exp_Da.m : Computes E (d (6; 0) j  under an unrestricted H0.
25. schott_func.m : Computes Tk>q, T*, T**, and their p-values.
26. Cl function : Computes a local minimizer Cq such that the p-value for testing
Ho: =  c0 is equal to a  by minimizing (p-value — a)2.
151
APPENDIX C
Programming Codes
152
The following codes for Matlab version 5.3 and 6.1 (Mathworks Inc., 2000) can be 
used to compute the maximum likelihood estimates of F, A, and S , likelihood ratio 
test statistics, and Bartlett-corrected test statistics, including p-values. Matrices can 
be stored as sparse matrices to reduce memory. The programming codes numbered 
1-9 from the preceding list can be found on BoiMs webpage : 
<http://www.math.montana.edu/~rjboik/spectral/>. 
solve_eta2.m function.
function [G,eta2] = solve_eta2(A,Al,A2,V3,M,mu,Gain)
%%
H  given a solution for mu, solve for the implicit parameter eta %%
%% etal must be zero then solve for eta.2%%
7.7.
[p,q]= size(A);Ip = speye(p);Dp = dup(p);
7.7.
7.7. find Mc which is Ip taken off M ’s column 7.7.
7.7.
b = M ’* [I :p] ’ ; Mc = Ip ; Mc(:,b) = [];
7.7.
7.7. initial guess of eta2 7.7.
7.7.
eta.2 = vech(Ip);
C = krqn(Me’,A’*Gam) ;
VG = (Al*V3*mu)+(A2*eta2);
G = dvec(VG,p,p);
7.7.
7.7. find eta2 using Newton’s method 7.7.
7.7.
while norm(G*G’-Ip,’fro’)+ norm(C*vec(G),’fro’) > le-8 
F = Dp’*vec(G*G’-Ip);
J = 2*Dp’*kron(G,Ip)*A2;
Jinv = inv(J); 
check = Jinv*F; 
eta2 = eta2-check;
VG = (Al*V3*mu)+(A2*eta2);
G = dvec(VG,p,p); 
end 7.7. end while loop 7.7.
153
solve_tau.m function.
function [Xi,t au] = solve_tau(phi,X i ,T 4 ,V l ,V 2 ,C2)
%%
0Zo0Zo given a solution for phi, solve for the implicit parameter tau Th 
Th initial guess of tau %%
0Zo0Z.
tau = inv(Vl’*V1)*V11*X i ;
0Zo0Z.
0Zo0Zo find tau using Newton’s method Th
Th
while norm(C2’*exp(T4*Xi),’fr o ’) > le-8 
F = C 2 ’*exp(T4*Xi);
J = C 2 ’*diag(exp(T4*Xi))*T4*Vl;
Jinv = inv(J); 
check = Jinv*F; 
tau = tau-check;
Xi = Vl*tau+V2*phi;. 
end 0Zo0Zo end while loop %%
find-guess.m function.
function [GamO,IamO] = find_guess(A,U,S,p,q,m,order)
0Zo0Z.
0Zo0Zo find initial guesses for Gamma and Lamda Th 
Th using Orthogonal Rotation to Congruence Th 
0Zo0Zo write S as its svd %%
Th
[Ghat,L h a t ,Gh]= svd(S);
0Zo0Z.
0Zo0Zo obtain initial guess for lamda Th
%%
IamO = diag(Lhat);
Th
Th order eigenvalues from small to large and Th 
Th also the corresponding eigenvectors Th
'Th
if order == 2
IamO = IamO([p:-!:!]);
Ghat = Ghat(:,[p:- I :1]);
LamO = diag(IamO); 
end;
Th
Th partition the eigenvector matrix Th
Th
Ghatl = Ghat(:,I :m ) ;Ghat2 = Ghat(:,m+1:p);
Th
154
%% begin with any orthogonal matrix %%
%%
QOl = orth(randn(m,m)); 
check = Q O l ; Ql=Ones(m,m);
%%
%% do loop until Ql converges Th 
Th
while (norm(Q1-check)) > le-4 
check = QO l ;
B2 = Ghatl*Q01’;
B2 = B 2 ( :,q+1:m);
[U3,D3,V3] = svd(B2,*U ) ; 
h  = sum(diag(D3)>le-20);
F = V3( :, I :h) *U35.;
Astar = [ A U* F ] ;
[U4,D4,V4] = svd(Ghatl’*Astar); 
h = sum(diag(D4)>le-20);
Ql = V4(:,l:h)*U4';
Th
Th check if tr close to m  Th 
% trl = trace(Ql’*Astar’*Ghatl);
Th
QOl = Q l ;
end; % end while °h 
Ac = null(full(Astar)’);
[U5,D5,V5] = svd(Ghat2’*Ac); 
h = sum(diag(D5)>le-20);
Q2 = V 5 ( :,I :h)*U5’;
Th
Th check if tr close to p-m Th 
% tr2 = trace(Q2’*Ac3 *Ghat2);
Th
Q = oplus(Ql,Q2);
GamO = [Astar Ac]*Q;
%% end function Th
fisher_scoringl.m function.
function [mu,newphi,check_conv] = fisher_scoringl(n,phi,G a m 3S,T l ,T2,D g l , 
L,check_convO)
Th
Th one iteration of Fisher_scoring algorithm %%
Th
[p,p]=Size(Gam);w= size(Dgl,2);
Ip2 = speye(p's2) ;Ipp = commute(p,p) ;
Np = (Ip2+Ipp)/2; 
check_eig = I;
155
%%
TL set mu to zero TL
%%
ZetaO = [zeros(u,I) ; ph i ] ;
%%
H  construct matrix of Fl %%
%%
lam = Tl*exp(T2*phi);
Lamda = dvec(L*lam,p,p);
Sigma = Gam*Lamda*Gam’; 
invsigma = inv(Sigma);
Fll = 2*Np*kron(Gam*Lamda,Gam)*Dgl;
Dll = Tl*diag(exp(T2*phi))*T2;
F12 = kron(Gam,Gam)*L*D11;
Fl = [ Fll F12] ;
0TL
TL construct information matrix TL
%%
pre = (n/2)*F1’*kron(invsigma,invsigma);
ihat = pre*Fl; 
invihat = inv(ihat);
%%
°TL construct score function %%
TL
Szeta = pre*vec(S-Sigma); 
delta = invihat*Szeta;
Zeta = ZetaO + delta;
TL
TL check if it is on the right direction but too far %% 
TL the magnitude is reduced TL 
TL
while norm(delta(l:w,:)) > 0 . 2 5  
delta. = delta/2;
Zeta = ZetaO + delta; 
end; % end while % 
check_conv = Szeta5*delta; 
while check_conv > check_convO 
delta = delta/2;
Zeta = ZetaO + delta; 
check_conv = Szeta5*delta; 
end; % end while %
TL
TL obtain mu and newphi TL 
TL
mu = Zeta(l:w,:); 
newphi = Zeta(w+1:end,:);
%% end function %% 
fisher_scoring2.m function.
156
function [mu,newphi,check_conv] = f isher_scoring2(n,p h i ,G a m ,S ,D g l , 
L ,check_convO,X i ,V 2 ,T 3 ,T4)
%%
U  one iteration of Fisher scoring algorithm %%
U
[p,p]=size(Gam);w= size(Dgl,2);
Ip = speye(p); Ip2 = speye(p'"2) ;
Ipp = commute(p,p);
Np = (Ip2+Ipp)/2; 
check_eig = I;
%%
TL set mu to zero TL
%%
ZetaO = [zeros(w,I ) ; p h i ] ; 
onep = ones(p,I);
TL
TL construct Fl TL
%%
wi = inv(onep’*T3*exp(T4*Xi)); 
lam = wi*phi(l)*T3*exp(T4*Xi);
Lamda = dvec(L*lam,p,p);
Sigma = Gam*Lamda*GamJ; 
invsigma = inv(Sigma);
Fll = 2*Np*kron(Gam*Lamda,Gam)*Dgl;
Hp = sparse(Ip - (lam/phi(I))♦onep’) ;
Wl = sparse(diag(exp(T4*Xi))*T4);
TL
TL Dll with respect to phiO and phi %%
%%
Dll = [wi*T3*exp(T4*Xi) wi*phi(I)*Hp*T3*Wl*V2];
F12 = kron(Gam,Gam)*L*Dll;
Fl = [ Fll F1 2 ] ;
TL
TL construct information matrix TL
TL
pre = (n/2)*F1’*kron(invsigma,invsigma);
ihat = pre*Fl; 
invihat = inv(ihat);
TL
TL score function TL
%%
Szeta = pre*vec(S-Sigma); 
delta = invihat*Szeta;
157
Zeta = ZetaO + delta;
Th
Th check if is on the right direction but too far Th 
Th the magnitude is reduced Th
%%
while norm(delta(l:w,:)) > 0 . 2 5  
delta = delta/2;
Zeta = ZetaO + delta; 
end; % end while % 
check_conv = Szetaj*delta; 
while check_conv >= check_convO 
delta = delta/2;
Zeta = ZetaO + delta; 
check_conv = Szetaj*delta; 
end; % end while %
Th
Th obtain mu and newphi Th
%%
mu = Zeta(l:w,:); 
newphi = Zeta(w+1:end,:);
Th end function Th
find-Vstar.m function.
function [V3,V4] = find_Vstar(A,Al,M,Gam)
Th
Th find V = (V3 V4) ensured that C*(Ip2-Ipp)*Al is invertable Th 
Th
[p,q]= size(A);
Ip = speye(p);
Ip2 = speye(p"2);
Ipp = commute(p,p);
U
Th M is the first m  columns of Ip Th
Th find Mc which is Ip taken off M js column Th
Th
b = M J*[l:p] J ; Mc = Ip ; M c (: ,b) = [];
Th
Th find V3 and V4 using the SVD Th 
Th V 3 JV3= I, V 4 JV4 = I and V 3 JV4 = 0 Th
%%
C = kr o n (MeJ,A J *Gam) ;
Cin = C*(Ip2-Ipp)*Al;
[U,D,V]=svd(full(Cin)); 
h = sum(diag(D)>le-20);
Th
Th obtain V4 and V3 %%
158
%%
V4 = V(:,l:h);
V3 = n u l l(V40;
V3 = sparse(V3);
V4 = sparse(V4);
%% end function %%
find-Vstar-bart.m  function.
function [V3,V4] = f ind_Vstar_bart(A ,A l ,M ,Gam)
%%
%% find V = (V3 V4) %%
H  using rref to increase zero entries %%
%%
Cp, q] = size (A);
Ip = speye(p);
Ip2 = speye(p~2);
Ipp = commute(p,p);
%%
°/0°/0 Mc is the last p-m columns of Ip %%
%%
b = M 5*[l:p]’; Mc = Ip ; Mc (: ,b) = [] ;
C = kron(Mc',A'*Gam);
Cin = C * (Ip2-Ipp)*A1*A1’;
AV4 = r r e f (Cin)5;
[U,D,V]=svd(full(Cin)); 
h = sum(diag(D)>le-20);
AV4 = A V 4 ( :,I :h);
0Z0Z,
0Z0Z obtain V4 b/c A1'A1V4=V4 %%
%%
V4 = Al ’ =t=AV4;
0Z0Z
0Z0Z obtain V3 which is orthogonal to V4 %%
%%
V3 = nu l l (full(V4)’);
AV3 = rr e f (V3’* A 1 ;
V3 = A1’ *AV3;
V3 = sparse(V3);
V4 = sparse(V4);
0Z0Zo end function 0Z0Z
Solvejtnlel.m function.
function [Gamma,Lamda,p h i ,L a ,loglk,conv] = solve_mleI(n,A ,A l ,A 2 ,L ,M , 
Gam,lam,S,Tl,T2,Dgs)
0Z0Z
159
%% compute maximum likelihood estimates of Gamma and Lamda %%
%%
[p,q]= size(A); Ip = speye(p);
%%
7o7o initial guess of zeta %%
%%
phi = inv(T2’*T2)*T21*log(inv(Tl3 *T1)*T1’*lam);
G = ones(p,p); loglk = [];i = 0;
Gamma = Gam;
check_conv = I; check_convO = le8; 
conv = I;
7.7.
7.7. check_conv - Szeta’*invihat*Szeta 7.7.
7.7.
La = -((n*p)/2)-(n/2)*log(det(S));
SO = S;
while sqrt(check_conv)> le-6 
SO = Gam’*S0*Gam;
A = rref(A’*Gam) ’;
Gam = Ip;
[V3,V4] = find_Vstar(A,Al,M,Gam);
Dgl = Dgs*V3;
[mu,newphi,check_conv] = fisher_scoringl(n,phi,Gam,SO,Tl,T2,Dgl, 
L,check_convO);
check_convO = check_conv;
7.7.
7.7. check convergence 7.7.
7.7.
if max(abs(T2*newphi)) > 14 
conv = 0; 
return;
end 7. end if %
[G,eta2] = solve_eta2(A,Al,A2,V3,M,mu,Gam); 
phi = newphi;
Gam = G;
Gamma = Gamma*G;
7.7.
7.7. computer log likelihood function without constant7.7.
7.7.
Iamda = Tl*exp(T2*phi);
Lamda = dvec(L*lamda,p,p);
Sigma = Gam*Lamda*Gam’; 
invsigma = inv(Sigma);
loglk = [loglk ; [ -(n*trace(SO*invsigma))/2-(n*log(det(Sigma)))/2 
check_conv]];
end; 7.7. end of while loop 7.7.
%% end fu n c t io n  Th  
solve_mle2.m function.
160
function [Gam,Lamda,p h i O ,Xi3L a 3loglk,conv] = solve_mle2(n,A l ,A 2 ,L ,Gam, 
lam,S,Dgs,T3,T4,C2)
Th
Th compute maximum likelihood estimate of Gamma,Lamda,phiO Th 
'Th
[p,p]= size(Gam); Ip = speye(p);
[p,q3]= size(T3); oneq3 = ones(q3,l);onep = ones(p,I);
%%
Th initial guess of Xi 'Th 
Th
X = [T4 oneqS ];
Xi = inv(X’*X)*X’*log(inv(T3’*T3)*T3’*lam) ;
Xi (end) = [] ;
G = ones(p,p); loglk = [];i = 0; 
check_conv = I; check_convO = IeS; 
conv = I ;
La = -((n*p)/2)-(n/2)*log(det(S)); 
trs = trace (S);
W = diag(exp(T4*Xi))*T 4 ;
[U,D,V] = svd(C2'*W);  
h = sum(diag(D)>le-20);
Vl = V ( :,I :h);
V2 = V (:,h+1:end);
phiI = inv(V2’*V2)*V2’*Xi;
[Xi,tau] = solve_tau(phil,Xi,T4,Vl,V2,C2); 
phi = [trs;phi1];
Th
Th check_conv = Szeta’*invihat*Szeta 'Th 
Th
while sqrt(check_conv)> le-6 
W = diag(exp(T4*Xi))*T4;
[U,D,V] = svd(C2’*W ) ; 
h = sum(diag(D)>le-20);
Vl = V ( :,I :h);
V2 = V(:,h+1:end);
p h i (2:end) = inv(V2’*V2)*V2’*Xi;
[Xi,tau] = solve_tau(phi(2:end),Xi,T4,Vl,V2,C2);
[mu,newphi,check.conv] = fisher_ scoring2(n,p h i ,Gam,S ,D g s ,L , 
check.convO,X i ,V 2 ,T 3 ,T 4 ) ; 
check.convO = check.conv;
[Xi,tau] = solve_tau(newphi(2:end),Xi,T4,Vl,V2,C2);
'Th
'Th check convergence Th
161
if max(abs(T4*Xi)) > 14 
conv = 0; 
return;
end % end if %
eta = solve_eta(Al,A2,mu);
VG = Al*mu+A2*eta;
G = dvec(VG,p,p);
Gam = Gam*G;
p h i (I) = newphi(l);
%%
%% computer log likelihood function without constant0/,0/,
U
lamda = p h i (I)*T3*exp(T4*Xi)/ (onep1*T3*exp(T4*Xi));
Lamda = dvec(L*lamda,p,p);
Sigma = Gam*Lamda*Ga m 5; 
invsigma = inv(Sigma);
loglk = [loglk ; [ - (n*trace(S*invsigma))/2-(n*log(det(Sigma)))/2 
check_conv]];
end; °/,0/, end of while loop 7,7, 
phiO = p h i (I);
7,7, end function 7,7,
Bartlettl.m function.
function elrt = Bartlettl(Gam,Lam,phi,Tl,T2,A2,L,V3,n,m,nu,Dgs) 
p = Iength(Gam); 
ci = size(Tl,2); 
k = length(m); 
df=round(sum(nu));
Idf=speye(df);
Ip = speye(p);
Ic = speye(ci);
Idf 2=speye (df'‘2) ; 
permdf=Commute(df,df); 
permp = commute(p,p); 
permp2 = commute (p'‘2 ,p~2) ; 
permpI = commute(nu(l),nu(l));
Ipp = speye(p~2);
Ndf=Idf2+permdf; 
tNp = (Ipp+permp);
Impp = Ipp-permp;
Ik = speye(k);
Dp = dup(p);
Lam = sparse(Lam);
Sigma = Gam*Lam*Gam’;
162
Sig = Sigma;
Sigi = inv(Sigma); 
for il=l:2 
I{il>=speye(nu(il));
Ipf{il>=speye(nu(il)*p); 
for 12=11:2
if isempty(nu(il)*nu(i2))==0
I_{il,i2}=commute(nu(il),nu(i2)); 
I_{i2,il}=I_{il,i2}'; 
else
I_{il,i2}=[];
I_{i2,il}=[] ; 
end 
end 
end
111 = speye(nu(l)~2);
NI = (Ill+I_{l,l})/2; 
nparam=round(sum(nu)); 
elrt=nparam;
7.7.
7,7» Create Dg(I) , Dg(2) ,Dg(S) 7.7.
7.7.
Dl_g_m = sparse(Dgs*V3);
K4 = sparse(kron3(Ip,vec(Ip)',Ip)); 
kdgl = sparse(kron(Dl_g_m,Dl_g_m));
K = K4*kdgl;
D2_g_mm = sparse(A2*Dp’* K ) ;
Kl = sparse(kron(I{l},permpl));
K2 = sparse(kron(D2_g_mm,Dl_g_m));
KS = sparse(K2+permp2*K2*Kl);
K5 = sparse(kron(Dl_g_m,permp*D2_g_mm)); 
K6 = sparse(kron(Ipp,permp)); 
clear K Kl K2 DgO Dg Wl 
D3_g_mmm = -A2*Dp’*K4*(K6*K3+K5); 
clear KS K4 K5 K6
7.7.
7.7. Create Dl(I) ,Dl(2) ,Dl(S) 7.7.
7.7.
Dl_l_p = sparse(Tl*diag(exp(T2*phi))*T2); 
W2 = sparse(I,I ) ;
WS = sparse(l,l); 
for i = I : ci 
ei = I c (:,i ) ;
W2 = W2 + kron(ei*ei',ei*ei’) ;
WS = WS + kronS(ei*ei’,ei*ei',e i ’); 
end;
163
ketl = sparse((kron(exp(T2*phi) ,T V ) )  ’) ;
D2_l_pp = sparse(ketl*W2*kron(T2,T2));
D3_l_ppp = sparse(ketl*W3*kron3(T2,T 2 ,T2)); 
clear W2 W3
Sig=sparse(Sig);
Sigi=sparse(Sigi);
V2i=kron(Sigi,Sigi ) ; 
index=[I:2]5; 
jj=find(nu==0); 
index(j j) = [] ;
U
%% Use Dl_{i> for first derivative of vec Sigma with respect to theta{i}'%% 
%% Theta = (mu5 p h i 5) 5%%
Vh Use D2_{i,j> for second derivative of vec Sigma with respect to Vh 
Vh theta-Ci}5 x theta{j > 5 Vh
°h°h Use D3_{i,j,k} for third derivative of vec Sigma with respect to%%
%% thetafi}5 x thetalj}5 x theta-Cj> 5Vh 
Vh
kgg = sparse(kron(Gam,Gam) );
kgl = sparse(kron(Gam*Lam,Gam));
kglg = sparse(kron3(Gam,vec(Lam)5,Gam));
kgig = sparse(kron3(Gam,vec(Ip)5,Gam));
kid = sparse(kron(L*Dl_l^p,Dl_g_m));
D1_{1> = sparse(tMp*kron(Gam*Lam,Gam)*Dl_g_m);
Dl_{2> = sparse(kgg*L*Dl_l_p);
D2_{1,1} = tNp*((kglg*(-kdgl))+kgl*D2_g_mm);
D2_{2,2} = kgg*L*D2_l_pp;
D2_{2,1> = tNp*kgig*kld;
D2_{1,2> = D2_{2,1>*I_{1,2}; 
kdld2 = kron(-Dl_g_m,D2_g_mm); 
kpd2dl = kron(permp*D2_g_mm,Dl_g_m);
DD_4 = kdld2+2*kpd2dl*kron(I{l},Nl);
D3_{1,1,1> = tMp*(kgl*D3_g_mmm+kglg*DD_4); 
clear DD_4
D3_{2,2,2> = kgg*L*D3_l_ppp;
D3_{2,2,1> = tMp*kgig*kron(L*D2_l_pp,Dl_g_m);
D3_{2,1,2} = D3_{2,2,l>*kron(I{2>,I_{1,2 » ;
D3_{1,2,2> = D3_{2,l,2}*kron(I_{l,2},I{2});
DD_3 = kron 3 (I p ,permp,Ip)*kdgl;
D3_{2,1,1> = tNp*((kgig*kron(L*Dl_l_p,D2_g_mm))+
(kron3(vec(L*Dl_l_p)5,Ga m ,Gam)*kron(I{2>,DD_3))); 
clear DD_3
D3_{1,2,1}=D3_{2,I ,l}*kron(I_{l,2 > ,I { 1 » ;
D3_{1,1 ,2}=D3_{1,2, l>*kron(I{l>, I_{1,2»;
164
clear kgg kgl kglg kgig kid kdld2 kpd2dl
U
%% Computation of Fl %%
%%
Fl=sparse(I,I);
F2=sparse(l,l);
F3=sparse(l,l); 
for 1 = 1 :length(index)
Il=Index(I);
Fl=Fl+Dl_{ll>*make_E(ll,nu); 
end;
Itheta=sparse(FI5 *V2i*Fl/2); 
Ithetai=Inv(Itheta);
Fisher_InfO=Itheta; 
clear D1_{1> Dl_{2>
%%
%% Computation of F2, F l l , & F3 %%
%%
nindex=length(index); 
for il=lrnindex 
ll=index(il);
El=make_E(ll,nu); 
for i2=l:nindex 
12=index(i2);
E2=make_E(12,nu); 
if isempty(D2_{11,12>) ==0 
F2=F2+D2_{H,12}*kron(El,E2) ; 
end; 
end 
end
Fll=reshape(F2,p's2*nparam,nparam); 
clear D2_{1,1> D2_{2,2} D2_{1,2> D2_{2,1>
for il=l:nindex 
ll=index(il);
El=make_E(ll,nu); 
for i2=l:nindex 
12=index(i2);
E2=make_E(12,nu);
E12=kron(El,E2); 
for i3=l:nindex 
13=index(i3);
if isempty(D3_{11,12,13}) == O 
E3=make_E(13,nu); 
E123=kron(E12,E3);
F3=F3+D3_{11,12,13>*E123;
165
end
end
end
end
clear D3_{2,2,2> D3_{2,2,1} D3_{2,1,1> D3_{1,2,2>
D3_{2,1,2} D3_{1,2,1> D3_{1,1,2} 
zeta=zeros(10,I ) ;
Ithetai2=kron(Ithetai,Ithetai); 
ithetai=vec(Ithetai); 
k3_tmp4=kron3(Ip,vec(Sigma)’,Ip);
V3=kron3(Sigi,vec(Sigi),Sigi);
F2dd=V2i*F2;
Fldd=V2i*Fl;
Flddd=V3*Fl;
FlFl=kron(Fl,Fl);
FlIthetai=Fl*Ithetai; 
tmp=Fldd’*F2;
K3s=2*Flddd ’ *F IF I-re shape (tmp, df '"2, df) ’ -tmp/2; 
tmp=V3'*FlFl;
Ul=reshape (F2dd,p''2*df ,df ) ...
2*reshape(reshape(tmp,df*p~2,df) ’,df,df*p~2); 
tmpl= kron(Ip,kron(vec(Sigi)’,Ip)*kron(Ip,Fl));
U2= -6*reshape(Flddd',p*2*df,p"2)*F2 + ...
+2*tmpl,*(tMp+permp)*tmp + ... 
reshape (reshape (V2i*F3 ,p''2*df'2, df ) ’ ,pT2*df,df~2) ;
Ll=Ul*vec(FlIthetai)/2;
L2=vec(Ul*kron(Idf,Fllthetai))/2; 
tmp=krbn3(Sigi,v e c (Sigi) ’ ,Ip)*FlFl;
K4s=12*FlFl’*V3*F2 ...
-9*tmp’*permp*tmp ...
-2*reshape(Fldd,*F3,df'‘2,df"2) . . .
-(3/2)*reshape(F2dd’*F2,df'"2,df ~2) ; 
clear V3 F2dd FlFl tmp tmpl
zeta(4)=Ll5 *Ithetai*Ll+L2’*kron(permdf,Ithetai)*L2; 
clear LI L2 
tmp=K3 s*ithetai;
zeta(5)=vec(K3s*Mf *Ithetai2*K3s’) ’*ithetai/4+tmpJ*lthetai*tmp/4; 
clear tmp
zeta(10)=vec(Ndf*Ithetai2)’*vec(K4s)/12+ithetai5 *K4s*ithetai/12;
zeta(l) = (l/2)*ithetai'*vec(Ul*kron(Ithetai*Fldd',tNp)*k3_tmp4’*F1) ; 
zeta(2) = (l/3)*ithetai’*vec(K3 s *kron(FIIthet ai,Fllthetai)’*Flddd); 
zeta(3)=(1/4)*ithetai’*vec(U1*kron(Itheta i ,tNp*kron(Sigma,Sigma))*U1’); 
zeta(6)=(l/2)*ithetai’*vec(K3s*Ndf*kron(Ithetai,Fllthetai’)*U1’); 
Tmp=KSs’*Ithetai*Ul;
166
ze t a (7)=Ithetai’*Tmp*vec(FlIthetai)/2;
zeta(8)=(1/6)*vec(kron(FlIthetai,Ithetai)*Ndf) ’*vec(U2);
ze t a (9)=ithetai’*U2'*vec(Fllthetai’)/6;
zeta=zeta/n;
elrt=df+sum(zeta);
%% end function %%
schott-func.m  function.
function [Tkq,Tstarl,Tstar2,P_Tkq,P_Tstarl,P_Tstar2] = schott_func(A,S, 
n,p,q,k)
Ip = speye(p); 
df = k*q;
Th
Th find Tkq using SVD od S Th 
Th
[Ghat,Lhat,Tmp] = svd(S);
I = diag(Lhat);
Y2 = Ghat(:,k+l:p);
T = O; 
for j = l:k 
d-Cj} = [] ; 
for i = k+l:p
d{j> = [d{j> ; l(j)*l(i)/(l(i)-l(j))'12 ] ; 
end;
D-Cj} = diag(d{j»;
Tj = trace(A’*Ghat(:,j)*Ghat(:,j ) ’*A*inv(A,*Y2*D{j}*Y2,*A));
T = T+Tj; 
end; % end for %
Tkq = n*T;
R  = Y2*Y2’*A;
[u,dl,v] = svd(R); 
r = sum(diag(dl)>le-20); 
if r == q
PJTkq = I - chi2cdf(Tkq,df); 
end;
if r < q P_Tkq = 0; end;
Th
Th find Bartlett adjustment factor Th 
Th
Ghat22 = Ghat(p-q+l:p,k+l:p); 
cl = 0;c2 = 0;c3 = 0;
for j = l:k
Sj = Ghat22*D{j}*Ghat22’;
167
Q-Cj } = Ghat22 ’ *inv(S j ) *Ghat22; 
for u = k+1 :p
• cll = q * l (u)/ ( K u ) - I (j)) ;
cl2 = 2*Q{j>(u-k,u-k)''2*(l(u)''2*l(j)'-3*(2*l(j)+3*l(u))/(l(u)-l(j))-6) ; 
cl3 =Q{j> (u-k,u-k)* (I(u)*1(j )*(2*1(u)"3-7*1(u)"2*1(j )+4*1(u)*1(j )"2+ 
l(j)"3)/(l(u)-l(j))"5); 
cl = cl+(cll+cl2+cl3); 
for v = k+1:p 
if v < u
c21 = (Q{j > (u-k,u-k)+Q{j > (v-k,v-k))*1(u)* 1 (v)*1(j)"2/((I(u)-I(j ))"2 
*(l(v)-l(j))"2);
deno = ((l(u)-l(j))"4*(l(v)-l(j))"4);
c22 = 2*Q{j > (u-k,v-k)"2*1(u)* 1(v)*1(j)"3*(I(u)* 1(v)*1(j)-3*1(j)"3+ 
(Ku) +1 (v) ) * (I (v) * (I (j ) -I (u) ) +1 (u) * (I ( j ) -I (v) ) +1 ( j ) "2) ) /deno; 
c23 = 2*Q{ j > (u-k,u-k) *G){ j > (v-k, v-k) *1 (u) *1 (v) *1 (j ) "3* (I (j ) "3-3*1 (u) 
*1 (v) *1 ( j) +1 (u) *1 (v) * (Ku) +1 (v) ))/ deno; 
c2 = c2+ (c21+c22-c23); 
end; 
end; 
end; 
end;
for i = l:k 
for j = l:k
if i ~= j
for u = k+1: p
c31 = Cj{j>(u-k,u-k)*l(u)*l(j)*l(i)*(l(u)"2-l(j)*l(i))/((l(u) 
-l(j))"2*(l(u)-l(i))"2*(l(j)-l(i))); 
c3 = c3 + c31; 
end; 
end; 
end; 
end;
Ckq = l/2*(k-l)+inv(k*q)*(cl - c2 - c 3 ) ;
Tstarl = (l-Ckq/n)*Tkq;
Tstar2 = Tkq/(l+Ckq/n);
P_Tstarl = l-chi2cdf(Tstarl,df);
P_Tstar2 = l-chi2cdf(Tstar2,df);
Th end function Th
Cl function.
function f = CI(c01,n,p,Al,A2,L,Gam_ini,lhat,S,Dgs,T3,T4,Cl,loglk) 
p = pvalue(c01,n,p,Al,A2,L,Gam_ini,lhat,S,Dgs,T3,T4,Cl,loglk); 
f = (p-0.05)"2;
168
pvalue function.
function p = pvalue(cOl,n,p,Al,A2,L,Gam_ini,Ihat,S,Dgs,T3,T4,C l 1Ioglkl)
C2 = T 3 ,*(Cl-ones(p,l)*c01);
[GamO,L a m O ,phiOO,Xi5La,loglk,convO]=solve_mle2(n,A l ,A 2 ,L ,Gam_ini,Ihat,S , 
Dgs,T3,T4,C2);
LRT = 2*(loglkl-loglk(end,l)); 
p = l -  chi2cdf(LRT5I)
The matrix C* from Example 2.
C =[] ;L=[] ;1=[] ;
Ip = speye(p);
linear = [1;2;3;4;5;6;7];
for i= l :6
H i }  = linear."!;
H = [ones(p,I) C ] ;
P = ppo(H);
Ci = (Ip-P)*l{i};
C = [C Ci] ; 
end;
for i= l :6
C(:,i)= C ( :,i)/sqrt(C(:,i),*C(:,i)); 
end;
MONTANA STATE UNIVERSITY - BOZEMAN
3 762 10359845 2