Matrix Algebra

Theory, Computations, and Applications in Statistics

 James E. Gentle

Part I, consisting of Chapters 1 through 7, covers most of the material in linear algebra needed by statisticians. (The word “matrix” in the title of the present book may suggest a somewhat more limited domain than “linear algebra”; but I use the former term only because it seems to be more commonly used by statisticians and is used more or less synonymously with the latter term.) The first four chapters cover the basics of vectors and matrices, concen - trating on topics that are particularly relevant for statistical applications. In Chapter 4, it is assumed that the reader is generally familiar with the basics of partial differentiation of scalar functions. Chapters 5 through 7 begin to take on more of an applications flavor, as well as beginning to give more consid - eration to computational methods. Although the details of the computations are not covered in those chapters, the topics addressed are oriented more to - ward computational algorithms. Chapter 5 covers methods for decomposing matrices into useful factors.
The material in Part I, as in the entire book, was built up recursively. In the first pass, I began with some definitions and followed those with some facts that are useful in applications. In the second pass, I went back and added definitions and additional facts that lead to the results stated in the first pass. The supporting material was added as close to the point where it was needed as practical and as necessary to form a logical flow. Facts motivated by additional applications were also included in the second pass. In subsequent passes, I continued to add supporting material as necessary and to address the linear algebra for additional areas of application. I sought a bare - bones presentation that gets across what I considered to be the theory necessary for most applications in the data sciences. The material chosen for inclusion is motivated by applications.
Throughout the book, some attention is given to numerical methods for computing the various quantities discussed. This is in keeping with my be - lief that statistical computing should be dispersed throughout the statistics curriculum and statistical literature generally. Thus, unlike in other books on matrix “theory”, I describe the “modified” Gram - Schmidt method, rather than just the “classical” GS. (I put “modified” and “classical” in quotes be - cause, to me, GS is MGS. History is interesting, but in computational matters, I do not care to dwell on the methods of the past.) Also, condition numbers of matrices are introduced in the “theory” part of the book, rather than just in the “computational” part. Condition numbers also relate to fundamental properties of the model and the data.
the form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.” (The statement in quotes appears word for word in several places in the book.) Standard textbooks on “matrices for statistical applications” emphasize their uses in the analysis of traditional linear models. This is a large and im - portant field in which real matrices are of interest, and the important kinds of real matrices include symmetric, positive definite, projection, and generalized inverse matrices. This area of application also motivates much of the discussion in this book. In other areas of statistics, however, there are different matrices of interest, including similarity and dissimilarity matrices, stochastic matrices, rotation matrices, and matrices arising from graph - theoretic approaches to data analysis. These matrices have applications in clustering, data mining, stochastic processes, and graphics; therefore, I describe these matrices and their special properties. I also discuss the geometry of matrix algebra. This provides a better intuition of the operations. Homogeneous coordinates and special operations in IR3 are covered because of their geometrical applications in statistical graphics.
Computer arithmetic differs from ordinary arithmetic in many ways; for ex - ample, computer arithmetic lacks associativity of addition and multiplication, and series often converge even when they are not supposed to. (On the com - puter, a straightforward evaluation of ∞ x=1 x converges!) I emphasize the differences between the abstract number system IR, called the reals, and the computer number system IF, the floating - point numbers unfortunately also often called “real”. Table 10.3 on page 400 summarizes some of these differences. All statisticians should be aware of the effects of these differences. I also discuss the differences between Z Z, the abstract number system called the integers, and the computer number system II, the fixed - point numbers. (Appendix A provides definitions for this and other notation that I use.) Chapter 10 also covers some of the fundamentals of algorithms, such as iterations, recursion, and convergence. It also discusses software development.
The prerequisites for this text are minimal. Obviously some background in mathematics is necessary. Some background in statistics or data analysis and some level of scientific computer literacy are also required. References to rather advanced mathematical topics are made in a number of places in the text. To some extent this is because many sections evolved from class notes that I developed for various courses that I have taught. All of these courses were at the graduate level in the computational and statistical sciences, but they have had wide ranges in mathematical level. I have carefully reread the sections that refer to groups, fields, measure theory, and so on, and am convinced that if the reader does not know much about these topics, the material is still understandable, but if the reader is familiar with these topics, the references add to that reader’s appreciation of the material. In many places, I refer to computer programming, and some of the exercises require some programming.
In regard to the use of the book as a text, most of the book evolved in one way or another for my own use in the classroom. I must quickly admit, how - ever, that I have never used this whole book as a text for any single course. I have used Part III in the form of printed notes as the primary text for a course in the “foundations of computational science” taken by graduate students in the natural sciences (including a few statistics students, but dominated by physics students). I have provided several sections from Parts I and II in online PDF files as supplementary material for a two - semester course in mathemati - cal statistics at the “baby measure theory” level (using Shao,2003). Likewise, for my courses in computational statistics and statistical visualization, I have provided many sections, either as supplementary material or as the primary text, in online PDF files or printed notes. I have not taught a regular “applied statistics” course in almost 30 years, but if I did, I am sure that I would draw heavily from Parts I and II for courses in regression or multivariate analysis.
Penultimately, I must make some statement about the relationship of this book to some other books on similar topics. Much important statisti - cal theory and many methods make use of matrix theory, and many sta - tisticians have contributed to the advancement of matrix theory from its very early days. Widely used books with derivatives of the words “statistics” and “matrices/linear - algebra” in their titles include Basilevsky (1983), Graybill (1983), Harville (1997), Schott (2004), and Searle (1982). All of these are useful books. The computational orientation of this book is probably the main difference between it and these other books. Also, some of these other books only address topics of use in linear models, whereas this book also dis - cusses matrices useful in graph theory, stochastic processes, and other areas of application. (If the applications are only in linear models, most matrices of interest are symmetric, and all eigenvalues can be considered to be real.) Other differences among all of these books, of course, involve the authors’ choices of secondary topics and the ordering of the presentation.

