3 Determinants and Diagonalization
Introduction
With each square matrix we can calculate a number, called the determinant of the matrix, which tells us whether or not the matrix is invertible. In fact, determinants can be used to give a formula for the inverse of a matrix. They also arise in calculating certain numbers (called eigenvalues) associated with the matrix. These eigenvalues are essential to a technique called diagonalization that is used in many applications where it is desired to predict the future behaviour of a system. For example, we use it to predict whether a species will become extinct.
Determinants were first studied by Leibnitz in 1696, and the term “determinant” was first used in 1801 by Gauss is his Disquisitiones Arithmeticae. Determinants are much older than matrices (which were introduced by Cayley in 1878) and were used extensively in the eighteenth and nineteenth centuries, primarily because of their significance in geometry. Although they are somewhat less important today, determinants still play a role in the theory and application of matrix algebra.
3.1 The Cofactor Expansion
In Section 2.4, we defined the determinant of a matrix
as follows:
and showed (in Example 2.4.4) that has an inverse if and only if det . One objective of this chapter is to do this for any square matrix A. There is no difficulty for matrices: If , we define and note that is invertible if and only if .
If is and invertible, we look for a suitable definition of by trying to carry to the identity matrix by row operations. The first column is not zero ( is invertible); suppose the (1, 1)-entry is not zero. Then row operations give
where and . Since is invertible, one of and is nonzero (by Example 2.4.11); suppose that . Then the reduction proceeds
where . We define
(3.1)
and observe that because (is invertible).
To motivate the definition below, collect the terms in Equation 3.1 involving the entries , , and in row 1 of :
This last expression can be described as follows: To compute the determinant of a matrix , multiply each entry in row 1 by a sign times the determinant of the matrix obtained by deleting the row and column of that entry, and add the results. The signs alternate down row 1, starting with . It is this observation that we generalize below.
Example 3.1.1
This suggests an inductive method of defining the determinant of any square matrix in terms of determinants
of matrices one size smaller. The idea is to define determinants of matrices in terms of determinants of matrices,
then we do matrices in terms of matrices, and so on.
To describe this, we need some terminology.
Definition 3.1 Cofactors of a matrix
Assume that determinants of matrices have been defined. Given the matrix , let
denote the matrix obtained from A by deleting row and column
Then the –cofactor is the scalar defined by
Here is called the sign of the -position.
The sign of a position is clearly or , and the following diagram is useful for remembering it:
Note that the signs alternate along each row and column with in the upper left corner.
Example 3.1.2
Solution:
Here is the matrix
that remains when row and column are deleted. The sign of position is (this is also the -entry in the sign diagram), so the -cofactor is
Turning to position , we find
Finally, the -cofactor is
Clearly other cofactors can be found—there are nine in all, one for each position in the matrix.
We can now define for any square matrix
Definition 3.2 Cofactor expansion of a Matrix
This is called the cofactor expansion of along row .
It asserts that can be computed by multiplying the entries of row by the corresponding
cofactors, and adding the results. The astonishing thing is that can be computed by taking the cofactor expansion along : Simply multiply each entry of that row or column by the corresponding cofactor and add.
Theorem 3.1.1 Cofactor Expansion Theorem
of . That is can be computed by multiplying each entry of the row or column by the corresponding cofactor and adding the results.
Example 3.1.3
.
Solution:
The cofactor expansion along the first row is as follows:
Note that the signs alternate along the row (indeed along row or column). Now we compute by expanding along the first column.
The reader is invited to verify that can be computed by expanding along any other row or column.
The fact that the cofactor expansion along of a matrix always gives the same result (the determinant of ) is remarkable, to say the least. The choice of a particular row or column can simplify the calculation.
Example 3.1.4
.
Solution:
The first choice we must make is which row or column to use in the
cofactor expansion. The expansion involves multiplying entries by
cofactors, so the work is minimized when the row or column contains as
many zero entries as possible. Row is a best choice in this matrix
(column would do as well), and the expansion is
This is the first stage of the calculation, and we have succeeded in expressing the determinant of the matrix
in terms of the determinant of a matrix. The next stage involves
this matrix. Again, we can use any row or column for the cofactor
expansion. The third column is preferred (with two zeros), so
This completes the calculation.
This example shows us that calculating a determinant is simplified a great deal when a row or column consists mostly of zeros. (In fact, when a row or column consists of zeros, the determinant is zero—simply expand along that row or column.) We did learn that one method of zeros in a matrix is to apply elementary row operations to it. Hence, a natural question to ask is what effect such a row operation has on the determinant of the matrix. It turns out that the effect is easy to determine and that elementary operations can be used in the same way. These observations lead to a technique for evaluating determinants that greatly reduces the labour involved. The necessary information is given in Theorem 3.1.2.
Theorem 3.1.2
Let denote an matrix.
- If A has a row or column of zeros, .
- If two distinct rows (or columns) of are interchanged, the determinant of the resulting matrix is .
- If a row (or column) of is multiplied by a constant , the determinant of the resulting matrix is .
- If two distinct rows (or columns) of are identical, .
- If a multiple of one row of is added to a different row (or if a multiple of a column is added to a different column), the determinant of
the resulting matrix is .
The following four examples illustrate how Theorem 3.1.2 is used to evaluate determinants.
Example 3.1.5
.
Solution:
The matrix does have zero entries, so expansion along (say) the second row would involve somewhat less work. However, a column operation can be
used to get a zero in position )—namely, add column 1 to column 3. Because this does not change the value of the determinant, we obtain
where we expanded the second matrix along row 2.
Example 3.1.6
evaluate where .
Solution:
First take common factors out of rows 2 and 3.
Now subtract the second row from the first and interchange the last two rows.
The determinant of a matrix is a sum of products of its entries. In particular, if these entries are polynomials in , then the determinant itself is a polynomial in . It is often of interest to determine which values of make the determinant zero, so it is very useful if the determinant is given in factored form. Theorem 3.1.2 can help.
Example 3.1.7
.
Solution:
To evaluate , first subtract times row 1 from rows 2 and 3.
At this stage we could simply evaluate the determinant (the result is ). But then we would have to factor this polynomial to find the values of that make it zero. However, this factorization can be obtained directly by first factoring each entry in the determinant and taking a common
factor of from each row.
Hence, means , that is or .
Example 3.1.8
Solution:
Begin by subtracting row 1 from rows 2 and 3, and then expand along column 1:
Now and are common factors in rows 1 and 2, respectively, so
The matrix in Example 3.1.8 is called a Vandermonde matrix, and the formula for its determinant can be generalized to the case.
If is an matrix, forming means multiplying row of by . Applying property 3 of Theorem 3.1.2, we can take the common factor out of each row and so obtain the following useful result.
Theoerem 3.1.3
The next example displays a type of matrix whose determinant is easy to compute.
Example 3.1.9
.
Solution:
Expand along row 1 to get . Now expand this along the top row to get , the product of the main diagonal entries.
A square matrix is called a if all entries above the main diagonal are zero (as in Example 3.1.9). Similarly, an is one for which all entries below the main diagonal are zero. A is one that is either upper or lower triangular. Theorem 3.1.4 gives an easy rule for calculating the determinant of any triangular matrix.
Theorem 3.1.4
Theorem 3.1.4 is useful in computer calculations because it is a routine matter to carry a matrix to triangular form using row operations.
3.2 Determinants and Matrix Inverses
In this section, several theorems about determinants are derived. One consequence of these theorems is that a square matrix is invertible if and only if . Moreover, determinants are used to give a formula for which, in turn, yields a formula (called Cramer’s rule) for the
solution of any system of linear equations with an invertible coefficient matrix.
We begin with a remarkable theorem (due to Cauchy in 1812) about the determinant of a product of matrices.
Theorem 3.2.1 Product Theorem
The complexity of matrix multiplication makes the product theorem quite unexpected. Here is an example where it reveals an important numerical identity.
Example 3.2.1
If and
then .
Hence gives the identity
Theorem 3.2.1 extends easily to . In fact, induction gives
for any square matrices of the same size. In particular, if each , we obtain
We can now give the invertibility condition.
Theorem 3.2.2
Proof:
If is invertible, then ; so the product theorem gives
Hence, and also .
Conversely, if , we show that can be carried to by elementary row operations (and invoke Theorem 2.4.5). Certainly, can be carried to its reduced row-echelon form , so where the are elementary matrices (Theorem 2.5.1). Hence the product theorem gives
Since for all elementary matrices , this shows . In particular, has no row of zeros, so because is square and reduced row-echelon. This is what we wanted.
Example 3.2.2
have an inverse?
Solution:
Compute by first adding times column 1 to column 3 and then expanding along row 1.
Hence, if or , and has an inverse if and .
Example 3.2.3
Solution:
We have by the product theorem, and by Theorem 3.2.2 because is invertible. Hence
so for each . This shows that each is invertible, again by Theorem 3.2.2.
Theorem 3.2.3
Proof:
Consider first the case of an elementary matrix . If is of type I or II, then ; so certainly . If is of type III, then is also of type III; so by Theorem 3.1.2. Hence, for every elementary matrix .
Now let be any square matrix. If is not invertible, then neither is ; so by Theorem 3.1.2. On the other hand, if is invertible, then , where the are elementary matrices (Theorem 2.5.2). Hence, so the product theorem gives
This completes the proof.
Example 3.2.4
Solution:
We use several of the facts just derived.
Example 3.2.5
Solution:
If is orthogonal, we have . Take determinants to obtain
Since is a number, this means .
Adjugates
In Section 2.4 we defined the adjugate of a 2 2 matrix
to be .
Then we verified that and hence that, if , . We are now able to define the adjugate of an arbitrary square matrix and to show that this formula for the inverse remains valid (when the
inverse exists).
Recall that the -cofactor of a square matrix is a number defined for each position in the matrix. If is a square matrix, the is defined to be the matrix whose -entry is the -cofactor of .
Definition 3.3 Adjugate of a Matrix
Example 3.2.6
and calculate and .
Solution:
We first find the cofactor matrix.
Then the adjugate of is the transpose of this cofactor matrix.
The computation of gives
and the reader can verify that also . Hence, analogy with the case would indicate that ; this is, in fact, the case.
The relationship holds for any square matrix .
Theorem 3.2.4 Adjugate formula
In particular, if det A 0, the inverse of A is given by
It is important to note that this theorem is an efficient way to find the inverse of the matrix . For example, if were , the calculation of would require computing determinants of matrices! On the other hand, the matrix inversion algorithm would find with about the same effort as finding . Clearly, Theorem 3.2.4 is not a result: its virtue is that it gives a formula for that is useful for purposes.
Example 3.2.7
Solution:
First compute
Since ,
the -entry of is the -entry of the matrix ; that is, it equals
Example 3.2.8
Solution:
Write ; we must show that . We have by Theorem 3.2.4, so taking determinants gives . Hence we are done if . Assume ; we must show that , that is, is not invertible. If , this follows from ; if , it follows because then .
Cramer’s Rule
Theorem 3.2.4 has a nice application to linear equations. Suppose
is a system of equations in variables . Here is the coefficient matrix and and are the columns
of variables and constants, respectively. If , we left multiply by to obtain the solution . When we use the adjugate formula, this becomes
Hence, the variables are given by
Now the quantity occurring in the formula for looks like the cofactor expansion of the determinant of a matrix. The cofactors involved are , corresponding to the first column of . If is obtained from by replacing the first column of by , then for each because column is deleted when computing them. Hence, expanding by the first column gives
Hence, and similar results hold for the other variables.
Theorem 3.2.5 Cramer’s Rule
of equations in the variables is given by
where, for each , is the matrix obtained from by replacing column by .
Example 3.2.9
Solution:
Compute the determinants of the coefficient matrix and the matrix obtained from it by replacing the first column by the column of constants.
Hence, by Cramer’s rule.
Cramer’s rule is an efficient way to solve linear systems or invert matrices. True, it enabled us to calculate here without computing or . Although this might seem an advantage, the truth of the matter is that, for large systems of equations, the number of computations needed to find the variables by the gaussian algorithm is comparable to the number required to find of the determinants involved in Cramer’s rule. Furthermore, the algorithm works when the matrix of the system is not invertible and even when the coefficient matrix is not square. Like the adjugate formula, then, Cramer’s rule is a practical numerical technique; its virtue is theoretical.
3.3 Diagonalization and Eigenvalues
The world is filled with examples of systems that evolve in time—the weather in a region, the economy of a nation, the diversity of an ecosystem, etc. Describing such systems is difficult in general and various methods have been developed in special cases. In this section we describe one such method, called which is one of the most important techniques in linear algebra. A very fertile example of this procedure is in modelling the growth of the population of an animal species. This has attracted more attention in recent years with the ever increasing awareness that many species are endangered. To motivate the technique, we begin by setting up a simple model of a bird population in which we make assumptions about survival and reproduction rates.
Example 3.3.1
Consider the evolution of the population of a species of birds. Because the number of males and females are nearly equal, we count only females. We assume that each female remains a juvenile for one year and then becomes an adult, and that only adults have offspring. We make three assumptions about reproduction and survival rates:
- The number of juvenile females hatched in any year is twice the number of adult females alive the year before (we say the is 2).
- Half of the adult females in any year survive to the next year (the is ).
- One-quarter of the juvenile females in any year survive into adulthood (the is ).
If there were 100 adult females and 40 juvenile females alive initially, compute the population of females years later.
Solution:
Let and denote, respectively, the number of adult and juvenile females after years, so that the total female population is the sum . Assumption 1 shows that , while assumptions 2 and 3 show that . Hence the numbers and in successive years are related by the following equations:
If we write
and
these equations take the matrix form
Taking gives , then taking gives , and taking gives . Continuing in this way, we get
Since
is known, finding the population profile amounts to computing for all . We will complete this calculation in Example 3.3.12 after some new techniques have been developed.
Let be a fixed matrix. A sequence of column vectors in is called a . Many models regard as a continuous function of the time , and replace our condition between and with a differential relationship viewed as functions of time if is known and the other are determined (as in Example 3.3.1) by the conditions
These conditions are called a for the vectors . As in Example 3.3.1, they imply that
so finding the columns amounts to calculating for .
Direct computation of the powers of a square matrix can be time-consuming, so we adopt an indirect method that is commonly used. The idea is to first the matrix , that is, to find an invertible matrix such that
(3.8)
This works because the powers of the diagonal matrix are easy to compute, and Equation (3.8) enables us to compute powers of the matrix in terms of powers of . Indeed, we can solve Equation (3.8) for to get . Squaring this gives
Using this we can compute as follows:
Continuing in this way we obtain Theorem 3.3.1 (even if is not diagonal).
Theorem 3.3.1
Hence computing comes down to finding an invertible matrix as in equation Equation (3.8). To do this it is necessary to first compute certain numbers (called eigenvalues) associated with the matrix .
Eigenvalue and Eigenvectors
Definition 3.4 Eigenvalues and Eigenvectors of a Matrix
In this case, is called an of corresponding to the eigenvalue , or a –for short.
Example 3.3.2
The matrix in Example 3.3.2 has another eigenvalue in addition to . To find it, we develop a general procedure for matrix .
By definition a number is an eigenvalue of the matrix if and only if for some column . This is equivalent to asking that the homogeneous system
of linear equations has a nontrivial solution . By Theorem 2.4.5 this happens if and only if the matrix is not invertible and this, in turn, holds if and only if the determinant of the coefficient matrix is zero:
This last condition prompts the following definition:
Definition 3.5 Characteristic Polynomial of a Matrix
Note that is indeed a polynomial in the variable , and it has degree when is an matrix (this is illustrated in the examples below). The above discussion shows that a number is an eigenvalue of if and only if , that is if and only if is a of the characteristic polynomial . We record these observations in
Theorem 3.3.2
Let be an matrix.
- The eigenvalues of are the roots of the characteristic polynomial of .
- The -eigenvectors are the nonzero solutions to the homogeneous system
of linear equations with as coefficient matrix.
In practice, solving the equations in part 2 of Theorem 3.3.2 is a routine application of gaussian elimination, but finding the eigenvalues can be difficult, often requiring computers. For now, the examples and exercises will be constructed so that the roots of the characteristic polynomials are relatively easy to find
(usually integers). However, the reader should not be misled by this into thinking that eigenvalues are so easily obtained for the matrices that occur in practical applications!
Example 3.3.3
discussed in Example 3.3.2, and then find all the eigenvalues and their eigenvectors.
Solution:
Since
we get
Hence, the roots of are and , so these are the eigenvalues of . Note that was the eigenvalue mentioned in Example 3.3.2, but we have found a new one: .
To find the eigenvectors corresponding to , observe that in this case
so the general solution to is
where is an arbitrary real number. Hence, the eigenvectors corresponding to are where is arbitrary. Similarly, gives rise to the eigenvectors which includes the observation in Example 3.3.2.
Note that a square matrix has eigenvectors associated with any given eigenvalue . In fact nonzero solution of is an eigenvector. Recall that these solutions are all linear combinations of certain basic solutions determined by the gaussian algorithm (see Theorem 1.3.2). Observe that any nonzero multiple of an eigenvector is again an eigenvector, and such multiples are often more convenient. Any set of nonzero multiples of the basic solutions of will be called a set of basic eigenvectors corresponding to .
GeoGebra Exercise: Eigenvalue and eigenvectors
https://www.geogebra.org/m/DJXTtm2k
Please answer these questions after you open the webpage:
1. Set the matrix to be
2. Drag the point until you see the vector and are on the same line. Record the value of . How many times do you see and lying on the same line when travel through the whole circle? Why?
3. Based on your observation, what can we say about the eigenvalue and eigenvector of ?
4. Set the matrix to be
and repeat what you did above.
5. Check your lecture notes about the eigenvalues and eigenvectors of this matrix. Are the results consistent with what you observe?
Example 3.3.4:
Find the characteristic polynomial, eigenvalues, and basic eigenvectors for
Solution:
Here the characteristic polynomial is given by
so the eigenvalues are , , and . To find all eigenvectors for , compute
We want the (nonzero) solutions to . The augmented matrix becomes
using row operations. Hence, the general solution to is
where is arbitrary, so we can use
as the basic eigenvector corresponding to . As the reader can verify, the gaussian algorithm gives basic eigenvectors
and
corresponding to and , respectively. Note that to eliminate fractions, we could instead use
as the basic -eigenvector.
Example 3.3.5
If is a square matrix, show that and have the same characteristic polynomial, and hence the same eigenvalues.
Solution:
We use the fact that . Then
by Theorem 3.2.3. Hence and have the same roots, and so and have the same eigenvalues (by Theorem 3.3.2).
The eigenvalues of a matrix need not be distinct. For example, if
the characteristic polynomial is so the eigenvalue 1 occurs twice. Furthermore, eigenvalues are usually not computed as the roots of the characteristic polynomial. There are iterative, numerical methods that are much more efficient for large matrices.