">

# 5.1 Subspaces and Spanning

In Section 2.2 we introduced the set of all -tuples (called \textit{vectors}), and began our investigation of the matrix transformations given by matrix multiplication by an matrix. Particular attention was paid to the euclidean plane where certain simple geometric transformations were seen to be matrix transformations.

In this chapter we investigate in full generality, and introduce some of the most important concepts and methods in linear algebra. The -tuples in   will continue to be denoted , , and so on, and will be written as rows or columns depending on the context.

## Subspaces of

Definition 5.1 Subspaces of

A set of vectors in is called a subspace of if it satisfies the following properties:

S1. The zero vector .

S2. If and , then .

S3. If , then for every real number .

We say that the subset is closed under addition if S2 holds, and that is closed under scalar multiplication if S3 holds.

Clearly is a subspace of itself, and this chapter is about these subspaces and their properties. The set , consisting of only the zero vector, is also a subspace because and for each in ; it is called the zero subspace. Any subspace of other than or is called a proper subspace.

We saw in Section 4.2 that every plane through the origin in has equation where , , and are not all zero. Here   is a normal for the plane and

where
and denotes the dot product introduced in Section 2.2 (see the diagram). Then is a subspace of . Indeed we show that satisfies S1, S2, and S3 as follows:

S1. because ;

S2. If and , then , so ;

S3. If , then , so .

Example 5.1.1

Planes and lines through the origin in are all subspaces of .

Solution:

We proved the statement for planes above. If is a line through the origin with direction vector , then . Can you verify that satisfies S1, S2, and S3?

Example 5.1.1 shows that lines through the origin in are subspaces; in fact, they are the only proper subspaces of . Indeed, we will prove that lines and planes through the origin in are the only proper subspaces of . Thus the geometry of lines and planes through the origin is captured by the subspace concept. (Note that every line or plane is just a translation of one of these.)

Subspaces can also be used to describe important features of an matrix . The null space of , denoted , and the image space of , denoted , are defined by

In the language of Chapter 2, consists of all solutions in of the homogeneous system , and is the set of all vectors in such that   has a solution . Note that is in if it satisfies the condition , while consists of vectors of the form for some in . These two ways to describe subsets occur frequently.

Example 5.1.2

If is an matrix, then:

1. is a subspace of .
2. is a subspace of .

Solution:

1. The zero vector lies in because . If and are in , then and are in because they satisfy the required condition:

Hence satisfies S1, S2, and S3, and so is a subspace of .

2. The zero vector lies in because . Suppose that and are in , say and where and are in . Then

show that and are both in (they have the required form). Hence is a subspace of .

There are other important subspaces associated with a matrix that clarify basic properties of . If is an matrix and is any number, let

A vector is in if and only if , so Example 5.1.2 gives:

Example 5.1.3

is a subspace of for each matrix and number .

is called the eigenspace of corresponding to . The reason for the name is that, in the terminology of Section 3.3, is an eigenvalue of if . In this case the nonzero vectors in are called the eigenvectors of corresponding to .

The reader should not get the impression that every subset of is a subspace. For example:

Hence neither nor is a subspace of .

## Spanning sets

Let and be two nonzero, nonparallel vectors in with their tails at the origin. The plane through the origin containing these vectors is described in Section 4.2 by saying that is a normal for , and that consists of all vectors such that .
While this is a very useful way to look at planes, there is another approach that is at least as useful in and, more importantly, works for all subspaces of for any .

The idea is as follows: Observe that, by the diagram, a vector is in if and only if it has the form

for certain real numbers and (we say that is a linear combination of and ).
Hence we can describe as

and we say that is a spanning set for . It is this notion of a spanning set that provides a way to describe all subspaces of .

As in Section 1.3, given vectors in , a vector of the form

is called a linear combination of the , and is called the coefficient of in the linear combination.

Definition 5.2 Linear Combinations and Span in

The set of all such linear combinations is called the span of the and is denoted

If , we say that is spanned by the vectors , and that the vectors span the space .

Here are two examples:

which we write as for simplicity.

In particular, the above discussion shows that, if and are two nonzero, nonparallel vectors in , then

is the plane in containing and . Moreover, if is any nonzero vector in (or ), then

is the line with direction vector . Hence lines and planes can both be described in terms of spanning sets.

Example 5.1.4

Let and in . Determine whether or are in .

Solution:

The vector is in if and only if for scalars and . Equating components gives equations

This linear system has solution and , so is in . On the other hand, asking that leads to equations

and this system has no solution. So does not lie in .

Theorem 5.1.1

Let in . Then:

1. is a subspace of containing each .
2. If is a subspace of and each , then .

Proof:

1. The zero vector is in because is a linear combination of the . If and are in , then and are in because

Finally each is in (for example, ) so S1, S2, and S3 are satisfied for , proving (1).

2. Let where the are scalars and each . Then each because satisfies S3. But then because satisfies S2 (verify). This proves (2).

Condition (2) in Theorem 5.1.1 can be expressed by saying that is the smallest subspace of that contains each . This is useful for showing that two subspaces and are equal, since this amounts to showing that both and . Here is an example of how it is used.

Example 5.1.5

If and are in , show that .

Solution:

Since both and are in , Theorem 5.1.1 gives

But and are both in , so

again by Theorem 5.1.1. Thus , as desired.

It turns out that many important subspaces are best described by giving a spanning set. Here are three examples, beginning with an important spanning set for itself. Column of the identity matrix is denoted and called the th coordinate vector in , and the set is called the standard basis of . If
is any vector in , then , as the reader can verify. This proves:

Example 5.1.6

where are the columns of .

If is an matrix , the next two examples show that it is a routine matter to find spanning sets for and .

Example 5.1.7

Given an matrix , let denote the basic solutions to the system given by the gaussian algorithm. Then

Solution:

If , then so Theorem 1.3.2 shows that is a linear combination of the basic solutions; that is, . On the other hand, if is in , then for scalars , so

This shows that , and hence that . Thus we have equality.

Example 5.1.8

Let denote the columns of the matrix . Then

Solution:

If is the standard basis of , observe that

Hence is in for each , so .

Conversely, let be in , say for some in . If
, then Definition 2.5 gives

This shows that , and the result follows.

GeoGebra Exercise: Linear Combination of vectors

https://www.geogebra.org/m/Q4hT3V5N

1.  Set and . Set
2. Click “start animation” to see which linear combination of and will produce the vector
3. Change and . Randomly choose a point P.
4. Click “start animation” to see which linear combination of and will produce the vector Write it down.
5. What if we set and ? Can you explain what’s happening now? Would you still be able to find a linear combination of and to produce a vector

# 5.2 Independence and Dimension

Some spanning sets are better than others. If is a subspace of , then every vector in can be written as a linear combination of the in at least one way. Our interest here is in spanning sets where each vector in has exactly one representation as a linear combination of these vectors.

## Linear Independence

Given in , suppose that two linear combinations are equal:

We are looking for a condition on the set of vectors that guarantees that this representation is unique; that is, for each . Taking all terms to the left side gives

so the required condition is that this equation forces all the coefficients to be zero.

Definition 5.3 Linear Independence in

We call a set of vectors linearly independent if it satisfies the following condition:

Theorem 5.2.1

If is an independent set of vectors in , then every vector in has a unique representation as a linear combination of the .

It is useful to state the definition of independence in different language. Let us say that a linear combination vanishes if it equals the zero vector, and
call a linear combination trivial if every coefficient is zero. Then the definition of independence can be compactly stated as follows:

A set of vectors is independent if and only if the only linear combination that vanishes is the trivial one.

Hence we have a procedure for checking that a set of vectors is independent:

Independence Test

To verify that a set of vectors in is independent, proceed as follows:

1. Set a linear combination equal to zero: .
2. Show that for each (that is, the linear combination is trivial).

Of course, if some nontrivial linear combination vanishes, the vectors are not independent.

Example 5.2.1

Determine whether is independent in .

Solution:

Suppose a linear combination vanishes:

Equating corresponding entries gives a system of four equations:

The only solution is the trivial one (please verify), so these vectors are independent by the independence test.

Example 5.2.2

Show that the standard basis of is independent.

Solution:

The components of are So the linear combination vanishes if and only if each . Hence the independence test applies.

Example 5.2.3

If is independent, show that is also independent.

Solution:

If , collect terms to get . Since is independent this combination must be trivial; that is, and . These equations have only the trivial solution , as required.

Example 5.2.4

Show that the zero vector in does not belong to any independent set.

Solution:

No set of vectors is independent because we have a vanishing, nontrivial linear combination .

Example 5.2.5

Given in , show that is independent if and only if .

Solution:

A vanishing linear combination from takes the form , in . This implies that because .

Example 5.2.6

Show that the nonzero rows of a row-echelon matrix are independent.

Solution:

We illustrate the case with 3 leading s; the general case is analogous. Suppose has the form

where indicates a nonspecified number. Let , , and denote the nonzero rows of . If we show that , then , and finally . The condition becomes

Equating second entries show that , so the condition becomes . Now the same argument shows that . Finally, this gives and we obtain .

A set of vectors in is called linearly dependent (or simply dependent) if it is not linearly independent, equivalently if some nontrivial linear combination vanishes.

Example 5.2.7

If and are nonzero vectors in , show that is dependent if and only if and are parallel.

Solution:

If and are parallel, then one is a scalar multiple of the other, say for some scalar . Then the nontrivial linear combination vanishes, so is dependent.

Conversely, if is dependent, let be nontrivial, say . Then so and are parallel. A similar argument works if .

With this we can give a geometric description of what it means for a set in to be independent. Note that this requirement means that is also independent ( means that ), so is the plane containing , , and (see the discussion preceding Example 5.1.4). So we assume that is independent in the following example.

Examples 5.2.8

Let , , and be nonzero vectors in where independent. Show that is independent if and only if is not in the plane . This is illustrated in the diagrams.

Solution:

If is independent, suppose is in the plane , say , where and are in . Then , contradicting the independence of .

On the other hand, suppose that is not in ; we must show that is independent. If where , , and are in , then since otherwise is in . But then , so by our assumption. This shows that is independent, as required.

By Theorem 2.4.5, the following conditions are equivalent for an matrix :

1. is invertible.
2. If where is in , then .
3. has a solution for every vector in .

While condition 1 makes no sense if is not square, conditions 2 and 3 are meaningful for any matrix and, in fact, are related to independence and spanning. Indeed, if are the columns of , and if we write
, then

by Definition 2.5. Hence the definitions of independence and spanning show, respectively, that condition 2 is equivalent to the independence of and condition 3 is equivalent to the requirement that . This discussion is summarized in the following theorem:

Theorem 5.2.2

If is an matrix, let denote the columns of .

1. is independent in if and only if , in , implies .
2. if and only if has a solution for every vector in .

For a square matrix , Theorem 5.2.2 characterizes the invertibility of in terms of the spanning and independence of its columns (see the discussion preceding Theorem 5.2.2). It is important to be able to discuss these notions for rows. If are rows, we define to be the set of all linear combinations of the (as matrices), and we say that is linearly independent if the only vanishing linear combination is the trivial one (that is, if is independent in , as the reader can verify).

Theorem 5.2.3

The following are equivalent for an matrix :

1. is invertible.
2. The columns of are linearly independent.
3. The columns of span .
4. The rows of are linearly independent.
5. The rows of span the set of all rows.

Proof:

Let denote the columns of .

(1) (2). By Theorem 2.4.5, is invertible if and only if implies ; this holds if and only if is independent by Theorem 5.2.2.

(1) (3). Again by Theorem 2.4.5,  is invertible if and only if has a solution for every column in ; this holds if and only if by Theorem 5.2.2.

(1) (4). The matrix is invertible if and only if is invertible (by Corollary 2.4.1 to Theorem 2.2.4); this in turn holds if and only if has independent columns (by (1) (2)); finally, this last statement holds if and only if has independent rows (because the rows of are the transposes of the columns of ).

(1) (5). The proof is similar to (1) (4).

Example 5.2.9

Show that is independent in .

Solution:

Consider the matrix with the vectors in as its rows. A routine computation shows that , so is invertible. Hence is independent by Theorem 5.2.3. Note that Theorem 5.2.3 also shows that .

## Dimension

It is common geometrical language to say that is 3-dimensional, that planes are 2-dimensional and that lines are 1-dimensional. The next theorem is a basic tool for clarifying this idea of “dimension”.

Theorem 5.2.4 Fundamental Theorem

Let be a subspace of . If is spanned by vectors, and if contains linearly independent vectors, then

Definition 5.4 Basis of

If is a subspace of , a set of vectors in is called a basis of if it satisfies the following two conditions:

1. is linearly independent.
2. .

Theorem 5.2.5 Invariance Theorem

If and are bases of a subspace of , then .

Proof:

We have by the fundamental theorem because spans , and is independent. Similarly, by interchanging ‘s and ‘s we get . Hence .

Definition 5.5 Dimension of a Subspace of

If is a subspace of and is any basis
of , the number, , of vectors in the basis is called the dimension of , denoted

The importance of the invariance theorem is that the dimension of can be determined by counting the number of vectors in any basis.

Let denote the standard basis of , that is the set of columns of the identity matrix. Then by Example 5.1.6, and is independent by Example 5.2.2. Hence it is indeed a basis of in the present terminology, and we have

Example 5.2.10

and is a basis.

This agrees with our geometric sense that is two-dimensional and is three-dimensional. It also says that is one-dimensional, and is a basis. Returning to subspaces of , we define

This amounts to saying has a basis containing no vectors. This makes sense because cannot belong to any independent set.

Example 5.2.11

Let . Show that is a subspace of , find a basis, and calculate .

Solution:

Clearly,
where
and  . It follows that , and hence that is a subspace of . Moreover, if , then
so . Hence is independent, and so a basis of . This means .

While we have found bases in many subspaces of , we have not yet shown that every subspace has a basis.

Theorem 5.2.6

Let be a subspace of . Then:

1. has a basis and .
2. Any independent set in can be enlarged (by adding vectors from the standard basis) to a basis of .
3. Any spanning set for can be cut down (by deleting vectors) to a basis of .

Example 5.2.3

Find a basis of containing where and .

Solution:

By Theorem 5.2.6 we can find such a basis by adding vectors from the standard basis of to . If we try , we find easily that is independent. Now add another vector from the standard basis, say .

Again we find that is independent. Since has vectors, then must span by Theorem 5.2.7  below (or simply verify it directly). Hence is a basis of .

Theorem 5.2.7

Let be a subspace of where and let be a set of vectors in . Then is independent if and only if spans .

Proof:

Suppose is independent. If does not span then, by Theorem 5.2.6, can be enlarged to a basis of containing more than vectors. This contradicts the invariance theorem because , so spans . Conversely, if spans but is not independent, then can be cut down to a basis of containing fewer than vectors, again a contradiction. So is independent, as required.

Theorem 5.2.8

Let be subspaces of . Then:

1. .
2. If , then .

Proof:

Write , and let be a basis of .

1. If , then is an independent set in containing more than vectors, contradicting the fundamental theorem. So .
2. If , then is an independent set in containing = vectors, so spans by Theorem~??. Hence , proving (2).

It follows from Theorem 5.2.8 that if is a subspace of , then is one of the integers , and that:

The other subspaces of are called proper. The following example uses Theorem 5.2.8 to show that the proper subspaces of are the lines through the origin, while the proper subspaces of are the lines and planes through the origin.

Example 5.2.14

1. If is a subspace of or , then if and only if is a line through the origin.
2. If is a subspace of , then if and only if is a plane through the origin.

Solution:

1. Since , let be a basis of . Then , so is the line through the origin with direction vector . Conversely each line with direction vector has the form . Hence is a basis of , so has dimension 1.
2. If has dimension 2, let be a basis of . Then and are not parallel (by Example 5.2.7) so . Let in denote the plane through the origin with normal . Then is a subspace of (Example 5.1.1) and both and lie in (they are orthogonal to ), so = by Theorem 5.1.1. Hence

Since and , it follows from Theorem 5.2.8 that or , whence or . But (for example, is not in ) and so is a plane through the origin.

Conversely, if is a plane through the origin, then , , , or by Theorem 5.2.8. But or because and , and by (1). So .

# 5.3 Orthogonality

Length and orthogonality are basic concepts in geometry and, in and , they both can be defined using the dot product. In this section we extend the dot product to vectors in , and so endow with euclidean geometry. We then introduce the idea of an orthogonal basis—one of the most useful concepts in linear algebra, and begin exploring some of its applications.

## Dot Product, Length, and Distance

If and are two -tuples in , recall that their dot product was defined in Section 2.2 as follows:

Observe that if and are written as columns then is a matrix product (and if they are written as rows). Here is a matrix, which we take to be a number.

Definition Length in

As in , the length of the vector is defined by

Where indicates the positive square root.

A vector of length is called a unit vector. If , then and it follows easily that
is a unit vector (see Theorem 5.3.6 below), a fact that we shall use later.

Example 5.3.1

If and in , then and . Hence is a unit vector; similarly is a unit vector.

Theorem 5.3.1

Let , , and denote vectors in . Then:

1. .
2. .
3. for all scalars .
4. .
5. , and if and only if .
6. for all scalars .

Proof:

(1), (2), and (3) follow from matrix arithmetic because ; (4) is clear from the definition; and (6) is a routine verification since . If , then so if and only if . Since each is a real number this happens if and only if for each ; that is, if and only if . This proves (5).

Because of Theorem 5.3.1, computations with dot products in are similar to those in . In particular, the dot product

equals the sum of terms, , one for each choice of and . For example:

holds for all vectors and .

Example 5.3.2

Show that for any and in .

Solution:

Using Theorem 5.3.1 several times:

Example 5.3.3

Suppose that for some vectors . If for each where is in , show that .

Solution:

We show by showing that and using (5) of Theorem 5.3.1. Since the span , write where the are in . Then

We saw in Section 4.2 that if and are nonzero vectors in , then where is the angle between and . Since for any angle , this shows that . In this form the result holds in .

Theorem 5.3.2 Cauchy Inequality

If and are vectors in , then

Moreover if and only if one of and is a multiple of the other.

Proof:

The inequality holds if or (in fact it is equality). Otherwise, write and for convenience. A computation like that preceding Example 5.3.2 gives

(5.1)

It follows that and , and hence that . Hence , proving the Cauchy inequality.

If equality holds, then , so or . Hence Equation (5.1) shows that or , so one of and is a multiple of the other (even if or ).

There is an important consequence of the Cauchy inequality. Given and in , use Example 5.3.2 and the fact that to compute

Taking positive square roots gives:

Corollary 5.3.1 Triangle Inequality

If and are vectors in , then .

The reason for the name comes from the observation that in the inequality asserts that the sum of the lengths of two sides of a triangle is not less than the length of the third side.

Definition 5.7 Distance in

If and are two vectors in , we define the distance between and by

Theorem 5.3.3

If , , and are three vectors in we have:

1. for all and .
2. if and only if .
3. for all and .
4. for all , , and . \quad Triangle inequality.

Proof:

(1) and (2) restate part (5) of Theorem 5.3.1 because , and (3) follows because for every vector in . To prove (4) use the Corollary to Theorem 5.3.2:

## Orthogonal Sets and the Expansion Theorem

Definition 5.8 Orthogonal and Orthonormal Sets

We say that two vectors and in are orthogonal if , extending the terminology in (See Theorem 4.2.3). More generally, a set of vectors in is called an orthogonal set if

Note that is an orthogonal set if . A set of vectors in is called orthonormal if it is orthogonal and, in addition, each is a unit vector:

Example 5.3.4

The standard basis is an orthonormal set in .

Example 5.3.5

If is orthogonal, so also is for any nonzero scalars .

If , it follows from item (6) of Theorem 5.3.1 that is a unit vector, that is it has length .

Definition 5.9 Normalizing an Orthogonal Set

If is an orthogonal set, then is an orthonormal set, and we say that it is the result of normalizing the orthogonal set .

Example 5.3.6

If , , , and
then is an orthogonal set in as is easily verified. After normalizing, the corresponding orthonormal set is

The most important result about orthogonality is Pythagoras’ theorem. Given orthogonal vectors and in , it asserts that

. In this form the result holds for any orthogonal set in .

Theorem 5.3.4 Pythagoras’ Theorem

If is an orthogonal set in , then

Proof:

The fact that whenever gives

This is what we wanted.

If and are orthogonal, nonzero vectors in , then they are certainly not parallel, and so are linearly independent Example 5.2.7. The next theorem gives a far-reaching extension of this observation.

Theorem 5.3.5

Every orthogonal set in is linearly independent.

Proof:

Let be an orthogonal set in and suppose a linear combination vanishes, say: . Then

Since , this implies that . Similarly for each .

Theorem 5.3.6

Let be an orthogonal basis of a subspace U of . If is any vector in , we have

Proof:

Since spans , we have where the are scalars. To find we take the dot product of both sides with :

Since , this gives . Similarly, for each .

The expansion in Theorem 5.3.6 of as a linear combination of the orthogonal basis is called the Fourier expansion of , and the coefficients are called the Fourier coefficients. Note that if is actually orthonormal, then for each .

Example 5.3.7

Expand as a linear combination of the orthogonal basis of given in Example 5.3.6.

Solution:

We have , , , and so the Fourier coefficients are

The reader can verify that indeed .

# 5.4 Rank of a Matrix

In this section we use the concept of dimension to clarify the definition of the rank of a matrix given in Section 1.2, and to study its properties. This requires that we deal with rows and columns in the same way. While it has been the custom to write the -tuples as columns, in this section we will frequently write them as rows. Subspaces, independence, spanning, and dimension are defined for rows using matrix operations, just as for columns. If is an matrix, we define:

Definition 5.10 Column and Row Space of a Matrix

The column space,  , of is the subspace of spanned by the columns of .

The row space, , of is the subspace of spanned by the rows of .

Lemma 5.4.1

Let and denote matrices.

1. If by elementary row operations, then .
2. If by elementary column operations, then .

Proof:

We prove (1); the proof of (2) is analogous. It is enough to do it in the case when by a single row operation. Let denote the rows of . The row operation either interchanges two rows, multiplies a row by a nonzero constant, or adds a multiple of a row to a different row. We leave the first two cases to the reader. In the last case, suppose that times row is added to row where . Then the rows of are , and Theorem 5.1.1 shows that

That is, .

If is any matrix, we can carry by elementary row operations where is a row-echelon matrix. Hence by Lemma 5.4.1; so the first part of the following result is of interest.

Lemma 5.4.2

If is a row-echelon matrix, then

1. The nonzero rows of are a basis of .
2. The columns of containing leading ones are a basis of .

Proof:

The rows of are independent, and they span by definition. This proves (1).

Let denote the columns of containing leading s. Then is independent because the leading s are in different rows (and have zeros below and to the left of them). Let denote the subspace of all columns in in which the last entries are zero. Then (it is just with extra zeros). Hence the independent set is a basis of by Theorem 5.2.7. Since each is in , it follows that , proving (2).

Let be any matrix and suppose is carried to some row-echelon matrix by row operations. Note that is not unique. In Section 1.2 we defined the rank of , denoted , to be the number of leading s in , that is the number of nonzero rows of . The fact that this number does not depend on the choice of was not proved. However part 1 of Lemma 5.4.2 shows that

and hence that is independent of .

Lemma 5.4.2 can be used to find bases of subspaces of (written as rows). Here is an example.

Example 5.4.1

Find a basis of .

Solution:

is the row space of . This matrix has row-echelon form , so is basis of by Lemma 5.4.1.

Note that is another basis that avoids fractions.

Theorem 5.4.1 Rank Theorem

Let denote any matrix of rank . Then

Moreover, if is carried to a row-echelon matrix by row operations, then

1. The nonzero rows of are a basis of .
2. If the leading s lie in columns of , then columns of are a basis of .

Proof:

We have by Lemma 5.4.1, so (1) follows from 5.4.2. Moreover, for some invertible matrix . Now write where are the columns of . Then

Thus, in the notation of (2), the set is a basis of by Lemma 5.4.2. So, to prove (2) and the fact that , it is enough to show that is a basis of . First, is linearly independent because is invertible (verify), so we show that, for each , column is a linear combination of the . But is column of , and so is a linear combination of the , say where each is a real number.

Since is invertible, it follows that
and the proof is complete.

Example 5.4.2

Compute the rank of  and find bases for and .

Solution:

The reduction of to row-echelon form is as follows:

Hence = 2, and
is a basis of by 5.4.2. Since the leading s are in columns 1 and 3 of the row-echelon matrix, Theorem 5.4.1 shows that columns 1 and 3 of are a basis
of .

Corollary 5.4.1

If is any matrix, then .

If is an matrix, we have and . Hence Theorem 5.2.8 shows that and . Thus Theorem 5.4.1 gives:

Corollary 5.4.2

If is an matrix, then and .

Corollary 5.4.3

whenever and are invertible.

Proof:

Lemma 5.4.1 gives . Using this and Corollary 5.4.1 we get

Lemma 5.4.3

Let , , and be matrices of sizes , ,
and respectively.

1. , with equality if for some .
2. , with equality if for some .

Proof:

For (1), write where is column of . Then we have  , and each is in by Definition 2.4. It follows that . If , we obtain in the same way. This proves (1).

As to (2), we have by (1), from which . If , this is equality as in the proof of (1).

Corollary 5.4.4

If is and is , then and .

Proof:

By Lemma 5.4.3 and , so Theorem 5.4.1 applies.

In Section 5.1 we discussed two other subspaces associated with an matrix : the null space and the image space

Using rank, there are simple ways to find bases of these spaces. If has rank , we have by Example 5.1.8, so . Hence Theorem 5.4.1 provides a method of finding a basis of . This is recorded as part (2) of the following theorem.

Theorem 5.4.2

Let denote an matrix of rank . Then

1. The basic solutions to the system provided by the gaussian algorithm are a basis of , so .
2. Theorem 5.4.1 provides a basis of , and .

Proof:

It remains to prove (1). We already know (Theorem 2.2.1) that is spanned by the basic solutions of . Hence using Theorem 5.2.7, it suffices to show that . So let be a basis of , and extend it to a basis of (by Theorem 5.2.6). It is enough to show that is a basis of ; then by the above and so as required.

Spanning. Choose in , in , and write where the are in .
Then because .

Independence. Let , in . Then is in , so for some in . But then the independence of the shows that for every .

Example 5.4.3

If , find bases of and , and so find their dimensions.

Solution:

If is in , then , so is given by solving the system . The reduction of the augmented matrix to reduced form is

Hence . Here, has basis
by Theorem 5.4.1 because the leading s are in columns 1 and 3. In particular, as in Theorem 5.4.2.

Turning to , we use gaussian elimination. The leading variables are and , so the nonleading variables become parameters: and . It follows from the reduced matrix that and , so the general solution is

Hence . But and are solutions (basic), so

However Theorem 5.4.2 asserts that is a basis of . (In fact it is easy to verify directly that is independent in this case.) In particular, .

Let be an matrix. Corollary 5.4.2 asserts that and , and it is natural to ask when these extreme cases arise. If are the columns of , Theorem 5.2.2 shows that spans if and only if the system is consistent for every in , and that is independent if and only if , in , implies . The next two useful theorems improve on both these results, and relate them to when the rank of is or .

Theorem 5.4.3

The following are equivalent for an matrix :

1. .
2. The rows of span .
3. The columns of are linearly independent in .
4. The matrix is invertible.
5. for some matrix .
6. If , in , then .

Proof:

(1) (2). We have , and by (1), so by Theorem 5.2.8. This is (2).

(2) (3). By (2), , so . This means . Since the columns of span , they are independent by Theorem 5.2.7.

(3) (4). If , in , we show that (Theorem 2.4.5). We have

Hence , so by (3) and Theorem 5.2.2.

(4) (5). Given (4), take .

(5) (6). If , then left multiplication by (from (5)) gives .

(6) (1). Given (6), the columns of are independent by Theorem 5.2.2. Hence , and (1) follows.

Theorem 5.4.4

The following are equivalent for an matrix :

1. .
2. The columns of span .
3. The rows of are linearly independent in .
4. The matrix is invertible.
5. for some matrix .
6. The system is consistent for every in .

Proof:

(1) (2). By (1), , so by Theorem 5.2.8.

(2) (3). By (2), , so . This means . Since the rows of span , they are independent by Theorem 5.2.7.

(3) (4). We have by (3), so the matrix has rank . Hence applying Theorem 5.4.3 to in place of shows that is invertible, proving (4).

(4) (5). Given (4), take in (5).

(5) (6). Comparing columns in gives for each , where and denote column of and respectively. Given in , write , in . Then holds with as the reader can verify.

(6) (1). Given (6), the columns of span by Theorem 5.2.2. Thus and (1) follows.

# 5.5 Similarity and Diagonalization

## Similar Matrices

Definition 5.11 Similar Matrices

If and are matrices, we say that and are similar, and write , if for some invertible matrix .

Note that if and only if where is invertible (write ). The language of similarity is used throughout linear algebra. For example, a matrix is diagonalizable if and only if it is similar to a diagonal matrix.

If , then necessarily . To see why, suppose that . Then where is invertible. This proves the second of the following properties of similarity:

(5.2)

These properties are often expressed by saying that the similarity relation is an equivalence relation on the set of matrices. Here is an example showing how these properties are used.

Example 5.5.1

If is similar to and either or is diagonalizable, show that the other is also diagonalizable.

Solution:

We have . Suppose that is diagonalizable, say where is diagonal. Since by (2) of (5.2), we have and . Hence by (3) of (5.2), so is diagonalizable too. An analogous argument works if we assume instead that is diagonalizable.

Similarity is compatible with inverses, transposes, and powers:

Definition 5.12 Trace of a matrix

The trace  of an matrix is defined to be the sum of the main diagonal elements of .

In other words:

It is evident that and that holds for all matrices and and all scalars . The following fact is more surprising.

Lemma 5.5.1

Let and be matrices. Then .

Proof:

Write and . For each , the -entry of the matrix is given as follows: . Hence

Similarly we have . Since these two double sums are the same, Lemma 5.5.1 is proved.

Theorem 5.5.1

If and are similar matrices, then and have the same determinant, rank, trace, characteristic polynomial, and eigenvalues.

Proof:

Let for some invertible matrix . Then we have

Similarly, by Corollary 5.4.2. Next Lemma 5.5.1 gives

As to the characteristic polynomial,

Finally, this shows that and have the same eigenvalues because the eigenvalues of a matrix are the roots of its characteristic polynomial.

Example 5.5.2

Sharing the five properties in Theorem 5.5.1 does not guarantee that two matrices are similar. The matrices
and have the same determinant, rank, trace, characteristic polynomial, and eigenvalues, but they are not similar because for any invertible matrix .

## Diagonalization Revisited

Recall that a square matrix is diagonalizable if there exists an invertible matrix such that is a diagonal matrix, that is if is similar to a diagonal matrix \index{diagonal matrices}. Unfortunately, not all matrices are diagonalizable, for example  . Determining whether is diagonalizable is closely related to the eigenvalues and eigenvectors of . Recall that a number is called an eigenvalue of if for some nonzero column in , and any such nonzero vector is called an eigenvector of corresponding to (or simply a -eigenvector of ). The eigenvalues and eigenvectors of are closely related to the characteristic polynomial  of , defined by

If is this is a polynomial of degree , and its relationship to the eigenvalues is given in the following theorem.

Theorem 5.5.2

Let be an matrix.

1. The eigenvalues of are the roots of the characteristic polynomial of .
2. The -eigenvectors are the nonzero solutions to the homogeneous system

of linear equations with as coefficient matrix.

The next theorem will show us the condition when a square matrix is diagonalizable.

Theorem 5.5.3

Let be an matrix.

1. is diagonalizable if and only if has a basis consisting of eigenvectors of .
2. When this is the case, the matrix is invertible and where, for each , is the eigenvalue of corresponding to .

The next result is a basic tool for determining when a matrix is diagonalizable. It reveals an important connection between eigenvalues and linear independence: Eigenvectors corresponding to distinct eigenvalues are necessarily linearly independent.

Theorem 5.5.4

Let be eigenvectors corresponding to distinct eigenvalues of an matrix . Then is a linearly independent set.

Theorem 5.5.5

If is an matrix with n distinct eigenvalues, then is diagonalizable

Example 5.5.4

Show that is diagonalizable.

Solution:

A routine computation shows that and so has distinct eigenvalues , , and . Hence Theorem 5.5.5 applies.

Definition 5.1.3 Eigenspace of a Matrix

If is an eigenvalue of an matrix , define the eigenspace of corresponding to by

This is a subspace of and the eigenvectors corresponding to are just the nonzero vectors in . In fact is the null space of the matrix :

The basic solutions of the homogeneous system given by the gaussian algorithm form a basis for . In particular

(5.5)

Now recall that the multiplicity of an eigenvalue of is the number of times occurs as a root of the characteristic polynomial of . In other words, the multiplicity of is the largest integer such that

for some polynomial . Because of (5.5), a square matrix is diagonalizable if and only if the multiplicity of each eigenvalue equals . We are going to prove this, and the proof requires the following result which is valid for any square matrix, diagonalizable or not.

Lemma 5.5.3

Let be an eigenvalue of multiplicity of a square matrix . Then .

It turns out that this characterizes the diagonalizable matrices for which factors completely over . By this we mean that , where the are real numbers (not necessarily distinct); in other words, every eigenvalue of is real. This need not happen (consider ), which leads us to the general conclusion regarding when a square matrix is diagonalizable.

Theorem 5.5.6

The following are equivalent for a square matrix for which factors completely.

1. is diagonalizable.
2. equals the multiplicity of for every eigenvalue of the matrix

Example 5.5.5

If and show that is diagonalizable but is not.

Solution:

We have so the eigenvalues are and . The corresponding eigenspaces are and where

as the reader can verify. Since is independent, we have which is the multiplicity of . Similarly, equals the multiplicity of . Hence is diagonalizable
by 5.5.6, and a diagonalizing matrix is .

Turning to , so the eigenvalues are and . The corresponding eigenspaces are and where

Here is smaller than the multiplicity of , so the matrix is not diagonalizable, again by Theorem 5.5.6. The fact that means that there is no possibility of finding three linearly independent eigenvectors.