2 Matrix Algebra
Introduction
In the study of systems of linear equations in Chapter 1, we found it convenient to manipulate the augmented matrix of the system. Our aim was to reduce it to row-echelon form (using elementary row operations) and hence to write down all solutions to the system. In the present chapter we consider matrices for their own sake. While some of the motivation comes from linear equations, it turns out that matrices can be multiplied and added and so form an algebraic system somewhat analogous to the real numbers. This “matrix algebra” is useful in ways that are quite different from the study of linear equations. For example, the geometrical transformations obtained by rotating the euclidean plane about the origin can be viewed as multiplications by certain matrices. These “matrix transformations” are an important tool in geometry and, in turn, the geometry provides a “picture” of the matrices. Furthermore, matrix algebra has many other applications, some of which will be explored in this chapter. This subject is quite old and was first studied systematically in 1858 by Arthur Cayley.
2.1 Matrix Addition, Scalar Multiplication, and Transposition
A rectangular array of numbers is called a matrix (the plural is matrices), and the numbers are called the entries of the matrix. Matrices are usually denoted by uppercase letters: , , , and so on. Hence,
are matrices. Clearly matrices come in various shapes depending on the number of rows and columns. For example, the matrix shown has rows and columns. In general, a matrix with rows and columns is referred to as an matrix or as having size . Thus matrices , , and above have sizes , , and , respectively. A matrix of size is called a row matrix, whereas one of size is called a column matrix. Matrices of size for some are called square matrices.
Each entry of a matrix is identified by the row and column in which it lies. The rows are numbered from the top down, and the columns are numbered from left to right. Then the -entry of a matrix is the number lying simultaneously in row and column . For example,
A special notation is commonly used for the entries of a matrix. If is an matrix, and if the -entry of is denoted as , then is displayed as follows:
This is usually denoted simply as . Thus is the entry in row and column of . For example, a matrix in this notation is written
It is worth pointing out a convention regarding rows and columns: Rows are mentioned before columns. For example:
- If a matrix has size , it has rows and columns.
- If we speak of the -entry of a matrix, it lies in row and column .
- If an entry is denoted , the first subscript refers to the row and the second subscript to the column in which lies.
Two points and in the plane are equal if and only if they have the same coordinates, that is and . Similarly, two matrices and are called equal (written ) if and only if:
- They have the same size.
- Corresponding entries are equal.
If the entries of and are written in the form , , described earlier, then the second condition takes the following form:
Example 2.1.1
discuss the possibility that , , .
Solution:
is impossible because and are of different sizes: is whereas is . Similarly, is impossible. But is possible provided that corresponding entries are equal:
means , , , and .
Matrix Addition
Definition 2.1 Matrix Addition
If and , this takes the form
Note that addition is not defined for matrices of different sizes.
Example 2.1.2
and ,
compute .
Solution:
Example 2.1.3
Solution:
Add the matrices on the left side to obtain
Because corresponding entries must be equal, this gives three equations: , , and . Solving these yields , , .
If , , and are any matrices of the same size, then
(commutative law)
In fact, if and , then the -entries of and are, respectively, and . Since these are equal for all and , we get
The associative law is verified similarly.
The matrix in which every entry is zero is called the zero matrix and is denoted as (or if it is important to emphasize the size). Hence,
holds for all matrices . The negative of an matrix (written ) is defined to be the matrix obtained by multiplying each entry of by . If , this becomes . Hence,
holds for all matrices where, of course, is the zero matrix of the same size as .
A closely related notion is that of subtracting matrices. If and are two matrices, their difference is defined by
Note that if and , then
is the matrix formed by subtracting corresponding entries.
Example 2.1.4
Solution:
Example 2.1.5
where is a matrix.
We solve a numerical equation by subtracting the number from both sides to obtain . This also works for matrices. To solve
simply subtract the matrix
from both sides to get
The reader should verify that this matrix does indeed satisfy the original equation.
The solution in Example 2.1.5 solves the single matrix equation directly via matrix subtraction: . This ability to work with matrices as entities lies at the heart of matrix algebra.
It is important to note that the sizes of matrices involved in some calculations are often determined by the context. For example, if
then and must be the same size (so that makes sense), and that size must be (so that the sum is ). For simplicity we shall often omit reference to such facts when they are clear from the context.
Scalar Multiplication
In gaussian elimination, multiplying a row of a matrix by a number means multiplying every entry of that row by .
Definition 2.2 Matrix Scalar Multiplication
The term scalar arises here because the set of numbers from which the entries are drawn is usually referred to as the set of scalars. We have been using real numbers as scalars, but we could equally well have been using complex numbers.
Example 2.1.6
and
compute , , and .
Solution:
If is any matrix, note that is the same size as for all scalars . We also have
because the zero matrix has every entry zero. In other words, if either or . The converse of this statement is also true, as Example 2.1.7 shows.
Example 2.1.7
Solution:
Write so that means for all and . If , there is nothing to do. If , then implies that for all and ; that is, .
For future reference, the basic properties of matrix addition and scalar multiplication are listed in Theorem 2.1.1.
Theorem 2.1.1
Let , , and denote arbitrary matrices where and are fixed. Let and denote arbitrary real numbers. Then
- .
- .
- There is an matrix , such that for each .
- For each there is an matrix, , such that .
- .
- .
- .
- .
Proof:
Properties 1–4 were given previously. To check Property 5, let and denote matrices of the same size. Then , as before, so the -entry of is
But this is just the -entry of , and it follows that . The other Properties can be similarly verified; the details are left to the reader.
The Properties in Theorem 2.1.1 enable us to do calculations with matrices in much the same way that
numerical calculations are carried out. To begin, Property 2 implies that the sum
is the same no matter how it is formed and so is written as . Similarly, the sum
is independent of how it is formed; for example, it equals both and . Furthermore, property 1 ensures that, for example,
In other words, the order in which the matrices are added does not matter. A similar remark applies to sums of five (or more) matrices.
Properties 5 and 6 in Theorem 2.1.1 are called distributive laws for scalar multiplication, and they extend to sums of more than two terms. For example,
Similar observations hold for more than three summands. These facts, together with properties 7 and 8, enable us to simplify expressions by collecting like terms, expanding, and taking common factors in exactly the same way that algebraic expressions involving variables and real numbers are manipulated. The following example illustrates these techniques.
Example 2.1.8
Solution:
The reduction proceeds as though , , and were variables.
Transpose of a Matrix
Many results about a matrix involve the rows of , and the corresponding result for columns is derived in an analogous way, essentially by replacing the word row by the word column throughout. The following definition is made with such applications in mind.
Definition 2.3 Transpose of a Matrix
In other words, the first row of is the first column of (that is it consists of the entries of column 1 in order). Similarly the second row of is the second column of , and so on.
Example 2.1.9
Solution:
If is a matrix, write . Then is the th element of the th row of and so is the th element of the th column of . This means , so the definition of can be stated as follows:
(2.1)
This is useful in verifying the following properties of transposition.
Theorem 2.1.2
Let and denote matrices of the same size, and let denote a scalar.
- If is an matrix, then is an matrix.
- .
- .
- .
Proof:
Property 1 is part of the definition of , and Property 2 follows from (2.1). As to Property 3: If , then , so (2.1) gives
Finally, if , then where Then (2.1) gives Property 4:
There is another useful way to think of transposition. If is an matrix, the elements are called the main diagonal of . Hence the main diagonal extends down and to the right from the upper left corner of the matrix ; it is shaded in the following examples:
Thus forming the transpose of a matrix can be viewed as “flipping” about its main diagonal, or as “rotating” through about the line containing the main diagonal. This makes Property 2 in Theorem~?? transparent.
Example 2.1.10
Solution:
Using Theorem 2.1.2, the left side of the equation is
Hence the equation becomes
Thus
, so finally
.
Note that Example 2.1.10 can also be solved by first transposing both sides, then solving for , and so obtaining . The reader should do this.
The matrix in Example 2.1.9 has the property that . Such matrices are important; a matrix is called symmetric if . A symmetric matrix is necessarily square (if is , then is , so forces ). The name comes from the fact that these matrices exhibit a symmetry about the main diagonal. That is, entries that are directly across the main diagonal from each other are equal.
For example, is symmetric when , , and .
Example 2.1.11
Solution:
We have and , so, by Theorem 2.1.2, we have . Hence is symmetric.
Example 2.1.12
Solution:
If we iterate the given equation, Theorem 2.1.2 gives
Subtracting from both sides gives , so .
2.2 Matrix-Vector Multiplication
Up to now we have used matrices to solve systems of linear equations by manipulating the rows of the augmented matrix. In this section we introduce a different way of describing linear systems that makes more use of the coefficient matrix of the system and leads to a useful way of “multiplying” matrices.
Vectors
It is a well-known fact in analytic geometry that two points in the plane with coordinates and are equal if and only if and . Moreover, a similar condition applies to points in space. We extend this idea as follows.
An ordered sequence of real numbers is called an ordered –tuple. The word “ordered” here reflects our insistence that two ordered -tuples are equal if and only if corresponding entries are the same. In other words,
Thus the ordered -tuples and -tuples are just the ordered pairs and triples familiar from geometry.
Definition 2.4 The set of ordered -tuples of real numbers
There are two commonly used ways to denote the -tuples in : As rows or columns ;
the notation we use depends on the context. In any event they are called vectors or –vectors and will be denoted using bold type such as x or v. For example, an matrix will be written as a row of columns:
If and are two -vectors in , it is clear that their matrix sum is also in as is the scalar multiple for any real number . We express this observation by saying that is closed under addition and scalar multiplication. In particular, all the basic properties in Theorem 2.1.1 are true of these -vectors. These properties are fundamental and will be used frequently below without comment. As for matrices in general, the zero matrix is called the zero –vector in and, if is an -vector, the -vector is called the negative .
Of course, we have already encountered these -vectors in Section 1.3 as the solutions to systems of linear equations with variables. In particular we defined the notion of a linear combination of vectors and showed that a linear combination of solutions to a homogeneous system is again a solution. Clearly, a linear combination of -vectors in is again in , a fact that we will be using.
Matrix-Vector Multiplication
Given a system of linear equations, the left sides of the equations depend only on the coefficient matrix and the column of variables, and not on the constants. This observation leads to a fundamental idea in linear algebra: We view the left sides of the equations as the “product” of the matrix and the vector . This simple change of perspective leads to a completely new way of viewing linear systems—one that is very useful and will occupy our attention throughout this book.
To motivate the definition of the “product” , consider first the following system of two equations in three variables:
(2.2)
and let , , denote the coefficient matrix, the variable matrix, and the constant matrix, respectively. The system (2.2) can be expressed as a single vector equation
which in turn can be written as follows:
Now observe that the vectors appearing on the left side are just the columns
of the coefficient matrix . Hence the system (2.2) takes the form
(2.3)
This shows that the system (2.2) has a solution if and only if the constant matrix is a linear combination of the columns of , and that in this case the entries of the solution are the coefficients , , and in this linear combination.
Moreover, this holds in general. If is any matrix, it is often convenient to view as a row of columns. That is, if are the columns of , we write
and say that is given in terms of its columns.
Now consider any system of linear equations with coefficient matrix . If is the constant matrix of the system, and if
is the matrix of variables then, exactly as above, the system can be written as a single vector equation
(2.4)
Example 2.2.1
in the form given in (2.4).
Solution:
As mentioned above, we view the left side of (2.4) as the product of the matrix and the vector . This basic idea is formalized in the following definition:
Definition 2.5 Matrix-Vector Multiplication
is any n-vector, the product is defined to be the -vector given by:
In other words, if is and is an -vector, the product is the linear combination of the columns of where the coefficients are the entries of (in order).
Note that if is an matrix, the product is only defined if is an -vector and then the vector is an -vector because this is true of each column of . But in this case the system of linear equations with coefficient matrix and constant vector takes the form of a single matrix equation
The following theorem combines Definition 2.5 and equation (2.4) and summarizes the above discussion. Recall that a system of linear equations is said to be consistent if it has at least one solution.
Theorem 2.2.1
- Every system of linear equations has the form where is the coefficient matrix, is the constant matrix, and is the matrix of variables.
- The system is consistent if and only if is a linear combination of the columns of .
- If are the columns of and if , then is a solution to the linear system if and only if are a solution of the vector equation
A system of linear equations in the form as in (1) of Theorem 2.2.1 is said to be written in matrix form. This is a useful way to view linear systems as we shall see.
Theorem 2.2.1 transforms the problem of solving the linear system into the problem of expressing the constant matrix as a linear combination of the columns of the coefficient matrix . Such a change in perspective is very useful because one approach or the other may be better in a particular situation; the importance of the theorem is that there is a choice.
Example 2.2.2
, compute .
Solution:
By Definition 2.5:
.
Example 2.2.3
Given columns , , , and in , write in the form where is a matrix and is a vector.
Solution:
Here the column of coefficients is
Hence Definition 2.5 gives
where is the matrix with , , , and as its columns.
Example 2.2.4
Let be the matrix given in terms of its columns
, , , and .
In each case below, either express as a linear combination of , , , and , or show that it is not such a linear combination. Explain what your answer means for the corresponding system of linear equations.
1.
2.
Solution:
By Theorem 2.2.1, is a linear combination of , , , and if and only if the system is consistent (that is, it has a solution). So in each case we carry the augmented matrix of the system to reduced form.
1. Here
, so the system has no solution in this case. Hence is \textit{not} a linear combination of , , , and .
2. Now
, so the system is consistent.
Thus is a linear combination of , , , and in this case. In fact the general solution is , , , and where and are arbitrary parameters. Hence
for any choice of and . If we take and , this becomes , whereas taking gives .
Example 2.2.5
Example 2.2.6
Solution:
If then Definition 2.5 gives
The matrix in Example 2.2.6 is called the identity matrix, and we will encounter such matrices again in future. Before proceeding, we develop some algebraic properties of matrix-vector multiplication that are used extensively throughout linear algebra.
Theorem 2.2.2
Let and be matrices, and let and be -vectors in . Then:
- .
- for all scalars .
- .
Proof:
We prove (3); the other verifications are similar and are left as exercises. Let and be given in terms of their columns. Since adding two matrices is the same as adding their columns, we have
If we write
Definition 2.5 gives
Theorem 2.2.2 allows matrix-vector computations to be carried out much as in ordinary arithmetic. For example, for any matrices and and any -vectors and , we have:
We will use such manipulations throughout the book, often without mention.
Linear Equations
Theorem 2.2.2 also gives a useful way to describe the solutions to a system
of linear equations. There is a related system
called the associated homogeneous system, obtained from the original system by replacing all the constants by zeros. Suppose is a solution to and is a solution to (that is and ). Then is another solution to . Indeed, Theorem 2.2.2 gives
This observation has a useful converse.
Theorem 2.2.3
for some solution of the associated homogeneous system .
Proof:
Suppose is also a solution to , so that . Write . Then and, using Theorem 2.2.2, we compute
Hence is a solution to the associated homogeneous system .
Note that gaussian elimination provides one such representation.
Example 2.2.7
Solution:
Gaussian elimination gives , , , and where and are arbitrary parameters. Hence the general solution can be written
Thus
is a particular solution (where ), and
gives all solutions to the associated homogeneous system. (To see why this is so, carry out the gaussian elimination again but with all the constants set equal to zero.)
The following useful result is included with no proof.
Theorem 2.2.4
The Dot Product
Definition 2.5 is not always the easiest way to compute a matrix-vector product because it requires that the columns of be explicitly identified. There is another way to find such a product which uses the matrix as a whole with no reference to its columns, and hence is useful in practice. The method depends on the following notion.
Definition 2.6 Dot Product in
obtained by multiplying corresponding entries and adding the results.
To see how this relates to matrix products, let denote a matrix and let be a -vector. Writing
in the notation of Section 2.1, we compute
From this we see that each entry of is the dot product of the corresponding row of with . This computation goes through in general, and we record the result in Theorem 2.2.5.
Theorem 2.2.5 Dot Product Rule
This result is used extensively throughout linear algebra.
If is and is an -vector, the computation of by the dot product rule is simpler than using Definition 2.5 because the computation can be carried out directly with no explicit reference to the columns of (as in Definition 2.5. The first entry of is the dot product of row 1 of with . In hand calculations this is computed by going across row one of , going down the column , multiplying corresponding entries, and adding the results. The other entries of are computed in the same way using the other rows of with the column .
In general, compute entry of as follows (see the diagram):
Go across row of and down column , multiply corresponding entries, and add the results.
As an illustration, we rework Example 2.2.2 using the dot product rule instead of Definition 2.5.
Example 2.2.8
and , compute .
Solution:
The entries of are the dot products of the rows of with :
Of course, this agrees with the outcome in Example 2.2.2.
Example 2.2.9
Solution:
Write , , and . Then the dot product rule gives , so the entries of are the left sides of the equations in the linear system. Hence the system becomes because matrices are equal if and only corresponding entries are equal.
Example 2.2.10
If is the zero matrix, then for each -vector .
Solution:
For each , entry of is the dot product of row of with , and this is zero because row of consists of zeros.
Definition 2.7 The Identity Matrix
The first few identity matrices are
In Example 2.2.6 we showed that for each -vector using Definition 2.5. The following result shows that this holds in general, and is the reason for the name.
Example 2.2.11
Solution:
We verify the case . Given the -vector
the dot product rule gives
In general, because entry of is the dot product of row of with , and row of has in position and zeros elsewhere.
Example 2.2.12
Solution:
Write
where , but for all . Then Theorem 2.2.5 gives
Example 2.2.12will be referred to later; for now we use it to prove:
Theorem 2.2.6
Proof:
Write and and in terms of their columns. It is enough to show that holds for all . But we are assuming that , which gives by Example 2.2.12.
We have introduced matrix-vector multiplication as a new way to think about systems of linear equations. But it has several other uses as well. It turns out that many geometric operations can be described using matrix multiplication, and we now investigate how this happens. As a bonus, this description provides a geometric “picture” of a matrix by revealing the effect on a vector when it is multiplied by . This “geometric view” of matrices is a fundamental tool in understanding them.
2.3 Matrix Multiplication
In Section 2.2 matrix-vector products were introduced. If is an matrix, the product was defined for any -column in as follows: If where the are the columns of , and if ,
Definition 2.5 reads
(2.5)
This was motivated as a way of describing systems of linear equations with coefficient matrix . Indeed every such system has the form where is the column of constants.
In this section we extend this matrix-vector multiplication to a way of multiplying matrices in general, and then investigate matrix algebra for its own sake. While it shares several properties of ordinary arithmetic, it will soon become clear that matrix arithmetic is different in a number of ways.
Definition 2.9 Matrix Multiplication
Thus the product matrix is given in terms of its columns : Column of is the matrix-vector product of and the corresponding column of . Note that each such product makes sense by Definition 2.5 because is and each is in (since has rows). Note also that if is a column matrix, this definition reduces to Definition 2.5 for matrix-vector multiplication.
Given matrices and , Definition 2.9 and the above computation give
for all in . We record this for reference.
Theorem 2.3.1
Here is an example of how to compute the product of two matrices using Definition 2.9.
Example 2.3.1
and
.
Solution:
The columns of are
and , so Definition 2.5 gives
Hence Definition 2.9 above gives .
While Definition 2.9 is important, there is another way to compute the matrix product that gives a way to calculate each individual entry. In Section 2.2 we defined the dot product of two -tuples to be the sum of the products of corresponding entries. We went on to show (Theorem 2.2.5) that if is an matrix and is an -vector, then entry of the product is the dot product of row of with . This observation was called the “dot product rule” for matrix-vector multiplication, and the next theorem shows that it extends to matrix multiplication in general.
Theorem 2.3.2 Dot Product Rule
product of row of with column of .
Proof:
Write in terms of its columns. Then is column of for each . Hence the -entry of is entry of , which is the dot product of row of with . This proves the theorem.
Thus to compute the -entry of , proceed as follows (see the diagram):
Go across row of , and down column of , multiply corresponding entries, and add the results.
Note that this requires that the rows of must be the same length as the columns of . The following rule is useful for remembering this and for deciding the size of the product matrix .
Compatibility Rule
Let and denote matrices. If is and is , the product can be formed if and only if . In this case the size of the product matrix is , and we say that is defined, or that and are compatible for multiplication.
The diagram provides a useful mnemonic for remembering this. We adopt the following convention:
Whenever a product of matrices is written, it is tacitly assumed that the sizes of the factors are such that the product is defined.
To illustrate the dot product rule, we recompute the matrix product in Example 2.3.1.
Example 2.3.3
and .
Solution:
Here is and is , so the product matrix is defined and will be of size . Theorem 2.3.2 gives each entry of as the dot product of the corresponding row of with the corresponding column of that is,
Of course, this agrees with Example 2.3.1.
Example 2.3.4
Then compute .
Solution:
The -entry of is the dot product of row 1 of and column 3 of (highlighted in the following display), computed by multiplying corresponding entries and adding the results.
Similarly, the -entry of involves row 2 of and column 4 of .
Since is and is , the product is .
Example 2.3.5
Solution:
Here, is a matrix and is a matrix, so and are not defined. However, the compatibility rule reads
so both and can be formed and these are and matrices, respectively.
Unlike numerical multiplication, matrix products and need not be equal. In fact they need not even be the same size, as Example 2.3.5 shows. It turns out to be rare that (although it is by no means impossible), and and are said to commute when this happens.
Example 2.3.6
Solution:
, so can occur even if . Next,
Hence , even though and are the same size.
Example 2.3.7
Solution:
These both follow from the dot product rule as the reader should verify. For a more formal proof, write where is column of . Then Definition 2.9 and Example 2.2.1 give
If denotes column of , then for each by Example 2.2.12. Hence Definition 2.9 gives:
The following theorem collects several results about matrix multiplication that are used everywhere in linear algebra.
Theorem 2.3.3
Assume that is any scalar, and that , , and are matrices of sizes such that the indicated matrix products are defined. Then:
1. and where denotes an identity matrix.
2. .
3. .
4. .
5. .
6. .
Proof:
Condition (1) is Example 2.3.7; we prove (2), (4), and (6) and leave (3) and (5) as exercises.
1. If in terms of its columns, then by Definition 2.9, so
4. We know (Theorem 2.2.) that holds for every column . If we write in terms of its columns, we get
6. As in Section 2.1, write and , so that and where and for all and . If denotes the -entry of , then is the dot product of row of with column of . Hence
But this is the dot product of row of with column of ; that is, the -entry of ; that is, the -entry of . This proves (6).
Property 2 in Theorem 2.3.3 is called the associative law of matrix multiplication. It asserts that the equation holds for all matrices (if the products are defined). Hence this product is the same no matter how it is formed, and so is written simply as . This extends: The product of four matrices can be formed several ways—for example, , , and —but the associative law implies that they are all equal and so are written as . A similar remark applies in general: Matrix products can be written unambiguously with no parentheses.
However, a note of caution about matrix multiplication must be taken: The fact that and need not be equal means that the order of the factors is important in a product of matrices. For example and may not be equal.
Warning:
If the order of the factors in a product of matrices is changed, the product matrix may change (or may not be defined). Ignoring this warning is a source of many errors by students of linear algebra!}
Properties 3 and 4 in Theorem 2.3.3 are called distributive laws. They assert that and hold whenever the sums and products are defined. These rules extend to more than two terms and, together with Property 5, ensure that many manipulations familiar from ordinary algebra extend to matrices. For example
Note again that the warning is in effect: For example need not equal . These rules make possible a lot of simplification of matrix expressions.
Example 2.3.8
Solution:
Example 2.3.9 and Example 2.3.10 below show how we can use the properties in Theorem 2.3.2to deduce other facts about matrix multiplication. Matrices and are said to commute if .
Example 2.3.9
Solution:
Showing that commutes with means verifying that . The computation uses the associative law several times, as well as the given facts that and .
Example 2.3.10
Solution:
The following always holds:
(2.6)
Hence if , then follows. Conversely, if this last equation holds, then equation (2.6 becomes
This gives , and follows.
In Section 2.2 we saw (in Theorem 2.2.1 ) that every system of linear equations has the form
where is the coefficient matrix, is the column of variables, and is the constant matrix. Thus the system of linear equations becomes a single matrix equation. Matrix multiplication can yield information about such a system.
Example 2.3.11
Solution:
Suppose that is any solution to the system, so that . Multiply both sides of this matrix equation by to obtain, successively,
This shows that if the system has a solution , then that solution must be , as required. But it does not guarantee that the system has a solution. However, if we write , then
Thus will be a solution if the condition is satisfied.
The ideas in Example 2.3.11 lead to important information about matrices; this will be pursued in the next section.
2.4 Matrix Inverse
Three basic operations on matrices, addition, multiplication, and subtraction, are analogs for matrices of the same operations for numbers. In this section we introduce the matrix analog of numerical division.
To begin, consider how a numerical equation is solved when and are known numbers. If , there is no solution (unless ). But if , we can multiply both sides by the inverse to obtain the solution . Of course multiplying by is just dividing by , and the property of that makes this work is that . Moreover, we saw in Section~?? that the role that plays in arithmetic is played in matrix algebra by the identity matrix . This suggests the following definition.
Definition 2.11 Matrix Inverses
A matrix that has an inverse is called an
Note that only square matrices have inverses. Even though it is plausible that nonsquare matrices and could exist such that and , where is and is , we claim that this forces . Indeed, if there exists a nonzero column such that (by Theorem 1.3.1), so , a contradiction. Hence . Similarly, the condition implies that . Hence so is square.}
Example 2.4.1
is an inverse of .
Solution:
Compute and .
Hence , so is indeed an inverse of .
Example 2.4.2
has no inverse.
Solution:
Let
denote an arbitrary matrix. Then
so has a row of zeros. Hence cannot equal for any .
The argument in Example 2.4.2 shows that no zero matrix has an inverse. But Example 2.4.2 also shows that, unlike arithmetic, it is possible for a nonzero matrix to have no inverse. However, if a matrix does have an inverse, it has only one.
Theorem 2.4.1
Proof:
Since and are both inverses of , we have . Hence
If is an invertible matrix, the (unique) inverse of is denoted . Hence (when it exists) is a square matrix of the same size as with the property that
These equations characterize in the following sense:
Inverse Criterion: If somehow a matrix can be found such that and , then is invertible and is the inverse of ; in symbols, .}
This is a way to verify that the inverse of a matrix exists. Example 2.3.3 and Example 2.3.4 offer illustrations.
Example 2.4.3
Solution:
We have , and so
Hence , as asserted. This can be written as , so it shows that is the inverse of . That is, .
The next example presents a useful formula for the inverse of a matrix when it exists. To state it, we define the and the of the matrix as follows:
Example 2.4.4
Solution:
For convenience, write and
. Then as the reader can verify. So if , scalar multiplication by gives
Hence is invertible and . Thus it remains only to show that if exists, then .
We prove this by showing that assuming leads to a contradiction. In fact, if , then , so left multiplication by gives ; that is, , so . But this implies that , , , and are all zero, so , contrary to the assumption that exists.
As an illustration, if
then . Hence is invertible and , as the reader is invited to verify.
The determinant and adjugate will be defined in Chapter 3 for any square matrix, and the conclusions in Example 2.4.4 will be proved in full generality.
Inverse and Linear systems
Matrix inverses can be used to solve certain systems of linear equations. Recall that a of linear equations can be written as a matrix equation
where and are known and is to be determined. If is invertible, we multiply each side of the equation on the left by to get
This gives the solution to the system of equations (the reader should verify that really does satisfy ). Furthermore, the argument shows that if is solution, then necessarily , so the solution is unique. Of course the technique works only when the coefficient matrix has an inverse. This proves Theorem 2.4.2.
Theorem 2.4.2
If the coefficient matrix is invertible, the system has the unique solution
Example 2.4.5
Solution:
In matrix form this is where , , and . Then , so is invertible and
by Example 2.4.4. Thus Theorem 2.4.2 gives
so the solution is and .
An inversion method
If a matrix is and invertible, it is desirable to have an efficient technique for finding the inverse. The following procedure will be justified in Section 2.5.
Matrix Inversion Algorithm
where the row operations on and are carried out simultaneously.
Example 2.4.6
Solution:
Apply elementary row operations to the double matrix
so as to carry to . First interchange rows 1 and 2.
Next subtract times row 1 from row 2, and subtract row 1 from row 3.
Continue to reduced row-echelon form.
Hence , as is readily verified.
Given any matrix , Theorem 1.2.1 shows that can be carried by elementary row operations to a matrix in reduced row-echelon form. If , the matrix is invertible (this will be proved in the next section), so the algorithm produces . If , then has a row of zeros (it is square), so no system of linear equations can have a unique solution. But then is not invertible by Theorem 2.4.2. Hence, the algorithm is effective in the sense conveyed in Theorem 2.4.3.
Theorem 2.4.3
first case, the algorithm produces ; in the second case, does not exist.
Properties of inverses
The following properties of an invertible matrix are used everywhere.
Example 2.4.7: Cancellation Laws
Let be an invertible matrix. Show that:
1. If , then .
2. If , then .
Solution:
Given the equation , left multiply both sides by to obtain . Thus , that is . This proves (1) and the proof of (2) is left to the reader.
Properties (1) and (2) in Example 2.4.7 are described by saying that an invertible matrix can be “left cancelled” and “right cancelled”, respectively. Note however that “mixed” cancellation does not hold in general: If is invertible and , then and may be equal, even if both are . Here is a specific example:
Sometimes the inverse of a matrix is given by a formula. Example 2.4.4 is one illustration; Example 2.4.8 and Example 2.4.9 provide two more. The idea is the : If a matrix can be found such that , then is invertible and .
Example 2.4.8
Solution:
exists (by assumption). Its transpose is the candidate proposed for the inverse of . Using the inverse criterion, we test it as follows:
Hence is indeed the inverse of ; that is, .
Example 2.4.9
Solution:
We are given a candidate for the inverse of , namely . We test it as follows:
Hence is the inverse of ; in symbols, .
We now collect several basic properties of matrix inverses for reference.
Theorem 2.4.4
All the following matrices are square matrices of the same size.
1. is invertible and .
2. If is invertible, so is , and .
3. If and are invertible, so is , and .
4. If are all invertible, so is their product , and
5. If is invertible, so is for any , and .
6. If is invertible and is a number, then is invertible and .
7. If is invertible, so is its transpose , and .
Proof:
1. This is an immediate consequence of the fact that .
2. The equations show that is the inverse of ; in symbols, .
3. This is Example 2.4.9.
4. Use induction on . If , there is nothing to prove, and if , the result is property 3. If , assume inductively that . We apply this fact together with property 3 as follows:
So the proof by induction is complete.
5. This is property 4 with .
6. The readers are invited to verify it.
7. This is Example 2.4.8.
The reversal of the order of the inverses in properties 3 and 4 of Theorem 2.4.4 is a consequence of the fact that matrix multiplication is not
commutative. Another manifestation of this comes when matrix equations are dealt with. If a matrix equation is given, it can be by a matrix to yield . Similarly, gives . However, we cannot mix the two: If , it need be the case that even if is invertible, for example, , .
Part 7 of Theorem 2.4.4 together with the fact that gives
Corollary 2.4.1
Example 2.4.10
Solution:
By Theorem 2.4.2 (2) and Example 2.4.4, we have
Hence , so
by Theorem 2.4.4(7).
The following important theorem collects a number of conditions all equivalent to invertibility. It will be referred to frequently below.
Theorem 2.4.5 Inverse Theorem
The following conditions are equivalent for an matrix :
1. is invertible.
2. The homogeneous system has only the trivial solution .
3. can be carried to the identity matrix by elementary row operations.
4. The system has at least one solution for every choice of column .
5. There exists an matrix such that .
Proof:
We show that each of these conditions implies the next, and that (5) implies (1).
(1) (2). If exists, then gives .
(2) (3). Assume that (2) is true. Certainly by row operations where is a reduced, row-echelon matrix. It suffices to show that . Suppose that this is not the case. Then has a row of zeros (being square). Now consider the augmented matrix of the system . Then is the reduced form, and also has a row of zeros. Since is square there must be at least one nonleading variable, and hence at least one parameter. Hence the system has infinitely many solutions, contrary to (2). So after all.
(3) (4). Consider the augmented matrix of the system . Using (3), let by a sequence of row operations. Then these same operations carry for some column . Hence the system has a solution (in fact unique) by gaussian elimination. This proves (4).
(4) (5). Write where are the columns of . For each \newline , the system has a solution by (4), so . Now let be the matrix with these matrices as its columns. Then Definition 2.9 gives (5):
(5) (1). Assume that (5) is true so that for some matrix . Then implies (because ). Thus condition (2) holds for the matrix rather than . Hence the argument above that (2) (3) (4) (5) (with replaced by ) shows that a matrix exists such that . But then
Thus which, together with , shows that is the inverse of . This proves (1).
The proof of (5) (1) in Theorem 2.4.5 shows that if for square matrices, then necessarily , and hence that and are inverses of each other. We record this important fact for reference.
Corollary 2.4.1
Here is a quick way to remember Corollary 2.4.1. If is a square matrix, then
1. If then .
2. If then .
Observe that Corollary 2.4.1 is false if and are not square matrices. For example, we have
In fact, it can be verified that if and , where is and is , then and and are (square) inverses of each other.
An matrix has if and only if (3) of Theorem 2.4.5 holds. Hence
Corollary 2.4.2