5 Vector Space
5.1 Subspaces and Spanning
In Section 2.2 we introduced the set of all
-tuples (called \textit{vectors}), and began our investigation of the matrix transformations
given by matrix multiplication by an
matrix. Particular attention was paid to the euclidean plane
where certain simple geometric transformations were seen to be matrix transformations.
In this chapter we investigate in full generality, and introduce some of the most important concepts and methods in linear algebra. The
-tuples in
will continue to be denoted
,
, and so on, and will be written as rows or columns depending on the context.
Subspaces of 
Definition 5.1 Subspaces of
A set of vectors in
is called a subspace of
if it satisfies the following properties:
S1. The zero vector .
S2. If and
, then
.
S3. If , then
for every real number
.
We say that the subset is closed under addition if S2 holds, and that
is closed under scalar multiplication if S3 holds.
Clearly is a subspace of itself, and this chapter is about these subspaces and their properties. The set
, consisting of only the zero vector, is also a subspace because
and
for each
in
; it is called the zero subspace. Any subspace of
other than
or
is called a proper subspace.
We saw in Section 4.2 that every plane
through the origin in
has equation
where
,
, and
are not all zero. Here
is a normal for the plane and
where
and
denotes the dot product introduced in Section 2.2 (see the diagram). Then
is a subspace of
. Indeed we show that
satisfies S1, S2, and S3 as follows:
S1. because
;
S2. If and
, then
, so
;
S3. If , then
, so
.
Example 5.1.1
Planes and lines through the origin in are all subspaces of
.
Solution:
We proved the statement for planes above. If is a line through the origin with direction vector
, then
. Can you verify that
satisfies S1, S2, and S3?
Example 5.1.1 shows that lines through the origin in are subspaces; in fact, they are the only proper subspaces of
. Indeed, we will prove that lines and planes through the origin in
are the only proper subspaces of
. Thus the geometry of lines and planes through the origin is captured by the subspace concept. (Note that every line or plane is just a translation of one of these.)
Subspaces can also be used to describe important features of an matrix
. The null space of
, denoted
, and the image space of
, denoted
, are defined by
In the language of Chapter 2, consists of all solutions
in
of the homogeneous system
, and
is the set of all vectors
in
such that
has a solution
. Note that
is in
if it satisfies the condition
, while
consists of vectors of the form
for some
in
. These two ways to describe subsets occur frequently.
Example 5.1.2
If is an
matrix, then:
is a subspace of
.
is a subspace of
.
Solution:
- The zero vector
lies in
because
. If
and
are in
, then
and
are in
because they satisfy the required condition:
Hence
satisfies S1, S2, and S3, and so is a subspace of
.
- The zero vector
lies in
because
. Suppose that
and
are in
, say
and
where
and
are in
. Then
show that
and
are both in
(they have the required form). Hence
is a subspace of
.
There are other important subspaces associated with a matrix that clarify basic properties of
. If
is an
matrix and
is any number, let
A vector is in
if and only if
, so Example 5.1.2 gives:
Example 5.1.3
is a subspace of
for each
matrix
and number
.
is called the eigenspace of
corresponding to
. The reason for the name is that, in the terminology of Section 3.3,
is an eigenvalue of
if
. In this case the nonzero vectors in
are called the eigenvectors of
corresponding to
.
The reader should not get the impression that every subset of is a subspace. For example:
Hence neither nor
is a subspace of
.
Spanning sets
Let and
be two nonzero, nonparallel vectors in
with their tails at the origin. The plane
through the origin containing these vectors is described in Section 4.2 by saying that
is a normal for
, and that
consists of all vectors
such that
.
While this is a very useful way to look at planes, there is another approach that is at least as useful in and, more importantly, works for all subspaces of
for any
.
The idea is as follows: Observe that, by the diagram, a vector
is in
if and only if it has the form
for certain real numbers and
(we say that
is a linear combination of
and
).
Hence we can describe as
and we say that is a spanning set for
. It is this notion of a spanning set that provides a way to describe all subspaces of
.
As in Section 1.3, given vectors in
, a vector of the form
is called a linear combination of the , and
is called the coefficient of
in the linear combination.
Definition 5.2 Linear Combinations and Span in
The set of all such linear combinations is called the span of the and is denoted
If , we say that
is spanned by the vectors
, and that the vectors
span the space
.
Here are two examples:
which we write as for simplicity.
In particular, the above discussion shows that, if and
are two nonzero, nonparallel vectors in
, then
is the plane in containing
and
. Moreover, if
is any nonzero vector in
(or
), then
is the line with direction vector . Hence lines and planes can both be described in terms of spanning sets.
Example 5.1.4
Let and
in
. Determine whether
or
are in
.
Solution:
The vector is in
if and only if
for scalars
and
. Equating components gives equations
This linear system has solution and
, so
is in
. On the other hand, asking that
leads to equations
and this system has no solution. So does not lie in
.
Theorem 5.1.1
Let in
. Then:
is a subspace of
containing each
.
- If
is a subspace of
and each
, then
.
Proof:
- The zero vector
is in
because
is a linear combination of the
. If
and
are in
, then
and
are in
because
Finally each
is in
(for example,
) so S1, S2, and S3 are satisfied for
, proving (1).
- Let
where the
are scalars and each
. Then each
because
satisfies S3. But then
because
satisfies S2 (verify). This proves (2).
Condition (2) in Theorem 5.1.1 can be expressed by saying that is the smallest subspace of
that contains each
. This is useful for showing that two subspaces
and
are equal, since this amounts to showing that both
and
. Here is an example of how it is used.
Example 5.1.5
If and
are in
, show that
.
Solution:
Since both and
are in
, Theorem 5.1.1 gives
But and
are both in
, so
again by Theorem 5.1.1. Thus , as desired.
It turns out that many important subspaces are best described by giving a spanning set. Here are three examples, beginning with an important spanning set for itself. Column
of the
identity matrix
is denoted
and called the
th coordinate vector in
, and the set
is called the standard basis of
. If
is any vector in
, then
, as the reader can verify. This proves:
Example 5.1.6
where
are the columns of
.
If is an
matrix
, the next two examples show that it is a routine matter to find spanning sets for
and
.
Example 5.1.7
Given an matrix
, let
denote the basic solutions to the system
given by the gaussian algorithm. Then
Solution:
If , then
so Theorem 1.3.2 shows that
is a linear combination of the basic solutions; that is,
. On the other hand, if
is in
, then
for scalars
, so
This shows that , and hence that
. Thus we have equality.
Example 5.1.8
Let denote the columns of the
matrix
. Then
Solution:
If is the standard basis of
, observe that
Hence is in
for each
, so
.
Conversely, let be in
, say
for some
in
. If
, then Definition 2.5 gives
This shows that , and the result follows.
GeoGebra Exercise: Linear Combination of vectors
https://www.geogebra.org/m/Q4hT3V5N
Please answer these questions after you open the webpage:
1. Set and
. Set
2. Click “start animation” to see which linear combination of and
will produce the vector
3. Change and
. Randomly choose a point P.
4. Click “start animation” to see which linear combination of and
will produce the vector
Write it down.
5. What if we set and
? Can you explain what’s happening now? Would you still be able to find a linear combination of
and
to produce a vector
5.2 Independence and Dimension
Some spanning sets are better than others. If is a subspace of
, then every vector in
can be written as a linear combination of the
in at least one way. Our interest here is in spanning sets where each vector in
has exactly one representation as a linear combination of these vectors.
Linear Independence
Given in
, suppose that two linear combinations are equal:
We are looking for a condition on the set of vectors that guarantees that this representation is unique; that is,
for each
. Taking all terms to the left side gives
so the required condition is that this equation forces all the coefficients to be zero.
Definition 5.3 Linear Independence in
We call a set of vectors linearly independent if it satisfies the following condition:
Theorem 5.2.1
If is an independent set of vectors in
, then every vector in
has a unique representation as a linear combination of the
.
It is useful to state the definition of independence in different language. Let us say that a linear combination vanishes if it equals the zero vector, and
call a linear combination trivial if every coefficient is zero. Then the definition of independence can be compactly stated as follows:
A set of vectors is independent if and only if the only linear combination that vanishes is the trivial one.
Hence we have a procedure for checking that a set of vectors is independent:
Independence Test
To verify that a set of vectors in
is independent, proceed as follows:
- Set a linear combination equal to zero:
.
- Show that
for each
(that is, the linear combination is trivial).
Of course, if some nontrivial linear combination vanishes, the vectors are not independent.
Example 5.2.1
Determine whether is independent in
.
Solution:
Suppose a linear combination vanishes:
Equating corresponding entries gives a system of four equations:
The only solution is the trivial one (please verify), so these vectors are independent by the independence test.
Example 5.2.2
Show that the standard basis of
is independent.
Solution:
The components of are
So the linear combination vanishes if and only if each
. Hence the independence test applies.
Example 5.2.3
If is independent, show that
is also independent.
Solution:
If , collect terms to get
. Since
is independent this combination must be trivial; that is,
and
. These equations have only the trivial solution
, as required.
Example 5.2.4
Show that the zero vector in does not belong to any independent set.
Solution:
No set of vectors is independent because we have a vanishing, nontrivial linear combination
.
Example 5.2.5
Given in
, show that
is independent if and only if
.
Solution:
A vanishing linear combination from takes the form
,
in
. This implies that
because
.
Example 5.2.6
Show that the nonzero rows of a row-echelon matrix are independent.
Solution:
We illustrate the case with 3 leading s; the general case is analogous. Suppose
has the form
where indicates a nonspecified number. Let
,
, and
denote the nonzero rows of
. If
we show that
, then
, and finally
. The condition
becomes
Equating second entries show that , so the condition becomes
. Now the same argument shows that
. Finally, this gives
and we obtain
.
A set of vectors in is called linearly dependent (or simply dependent) if it is not linearly independent, equivalently if some nontrivial linear combination vanishes.
Example 5.2.7
If and
are nonzero vectors in
, show that
is dependent if and only if
and
are parallel.
Solution:
If and
are parallel, then one is a scalar multiple of the other, say
for some scalar
. Then the nontrivial linear combination
vanishes, so
is dependent.
Conversely, if is dependent, let
be nontrivial, say
. Then
so
and
are parallel. A similar argument works if
.
With this we can give a geometric description of what it means for a set in
to be independent. Note that this requirement means that
is also independent (
means that
), so
is the plane containing
,
, and
(see the discussion preceding Example 5.1.4). So we assume that
is independent in the following example.
Examples 5.2.8
Let ,
, and
be nonzero vectors in
where
independent. Show that
is independent if and only if
is not in the plane
. This is illustrated in the diagrams.
Solution:
If is independent, suppose
is in the plane
, say
, where
and
are in
. Then
, contradicting the independence of
.
On the other hand, suppose that is not in
; we must show that
is independent. If
where
,
, and
are in
, then
since otherwise
is in
. But then
, so
by our assumption. This shows that
is independent, as required.
By Theorem 2.4.5, the following conditions are equivalent for an matrix
:
is invertible.
- If
where
is in
, then
.
has a solution
for every vector
in
.
While condition 1 makes no sense if is not square, conditions 2 and 3 are meaningful for any matrix
and, in fact, are related to independence and spanning. Indeed, if
are the columns of
, and if we write
, then
by Definition 2.5. Hence the definitions of independence and spanning show, respectively, that condition 2 is equivalent to the independence of and condition 3 is equivalent to the requirement that
. This discussion is summarized in the following theorem:
Theorem 5.2.2
If is an
matrix, let
denote the columns of
.
is independent in
if and only if
,
in
, implies
.
if and only if
has a solution
for every vector
in
.
For a square matrix , Theorem 5.2.2 characterizes the invertibility of
in terms of the spanning and independence of its columns (see the discussion preceding Theorem 5.2.2). It is important to be able to discuss these notions for rows. If
are
rows, we define
to be the set of all linear combinations of the
(as matrices), and we say that
is linearly independent if the only vanishing linear combination is the trivial one (that is, if
is independent in
, as the reader can verify).
Theorem 5.2.3
The following are equivalent for an matrix
:
is invertible.
- The columns of
are linearly independent.
- The columns of
span
.
- The rows of
are linearly independent.
- The rows of
span the set of all
rows.
Proof:
Let denote the columns of
.
(1) (2). By Theorem 2.4.5,
is invertible if and only if
implies
; this holds if and only if
is independent by Theorem 5.2.2.
(1) (3). Again by Theorem 2.4.5,
is invertible if and only if
has a solution for every column
in
; this holds if and only if
by Theorem 5.2.2.
(1) (4). The matrix
is invertible if and only if
is invertible (by Corollary 2.4.1 to Theorem 2.2.4); this in turn holds if and only if
has independent columns (by (1)
(2)); finally, this last statement holds if and only if
has independent rows (because the rows of
are the transposes of the columns of
).
(1) (5). The proof is similar to (1)
(4).
Example 5.2.9
Show that is independent in
.
Solution:
Consider the matrix with the vectors in
as its rows. A routine computation shows that
, so
is invertible. Hence
is independent by Theorem 5.2.3. Note that Theorem 5.2.3 also shows that
.
Dimension
It is common geometrical language to say that is 3-dimensional, that planes are 2-dimensional and that lines are 1-dimensional. The next theorem is a basic tool for clarifying this idea of “dimension”.
Theorem 5.2.4 Fundamental Theorem
Let be a subspace of
. If
is spanned by
vectors, and if
contains
linearly independent vectors, then
Definition 5.4 Basis of
If is a subspace of
, a set
of vectors in
is called a basis of
if it satisfies the following two conditions:
is linearly independent.
.
Theorem 5.2.5 Invariance Theorem
If and
are bases of a subspace
of
, then
.
Proof:
We have by the fundamental theorem because
spans
, and
is independent. Similarly, by interchanging
‘s and
‘s we get
. Hence
.
Definition 5.5 Dimension of a Subspace of
If is a subspace of
and
is any basis
of , the number,
, of vectors in the basis is called the dimension of
, denoted
The importance of the invariance theorem is that the dimension of can be determined by counting the number of vectors in any basis.
Let denote the standard basis of
, that is the set of columns of the identity matrix. Then
by Example 5.1.6, and
is independent by Example 5.2.2. Hence it is indeed a basis of
in the present terminology, and we have
Example 5.2.10
and
is a basis.
This agrees with our geometric sense that is two-dimensional and
is three-dimensional. It also says that
is one-dimensional, and
is a basis. Returning to subspaces of
, we define
This amounts to saying has a basis containing no vectors. This makes sense because
cannot belong to any independent set.
Example 5.2.11
Let . Show that
is a subspace of
, find a basis, and calculate
.
Solution:
Clearly,
where
and
. It follows that
, and hence that
is a subspace of
. Moreover, if
, then
so
. Hence
is independent, and so a basis of
. This means
.
While we have found bases in many subspaces of , we have not yet shown that every subspace has a basis.
Theorem 5.2.6
Let be a subspace of
. Then:
has a basis and
.
- Any independent set in
can be enlarged (by adding vectors from the standard basis) to a basis of
.
- Any spanning set for
can be cut down (by deleting vectors) to a basis of
.
Example 5.2.3
Find a basis of containing
where
and
.
Solution:
By Theorem 5.2.6 we can find such a basis by adding vectors from the standard basis of to
. If we try
, we find easily that
is independent. Now add another vector from the standard basis, say
.
Again we find that is independent. Since
has
vectors, then
must span
by Theorem 5.2.7 below (or simply verify it directly). Hence
is a basis of
.
Theorem 5.2.7
Let be a subspace of
where
and let
be a set of
vectors in
. Then
is independent if and only if
spans
.
Proof:
Suppose is independent. If
does not span
then, by Theorem 5.2.6,
can be enlarged to a basis of
containing more than
vectors. This contradicts the invariance theorem because
, so
spans
. Conversely, if
spans
but is not independent, then
can be cut down to a basis of
containing fewer than
vectors, again a contradiction. So
is independent, as required.
Theorem 5.2.8
Let be subspaces of
. Then:
.
- If
, then
.
Proof:
Write , and let
be a basis of
.
- If
, then
is an independent set in
containing more than
vectors, contradicting the fundamental theorem. So
.
- If
, then
is an independent set in
containing
=
vectors, so
spans
by Theorem~??. Hence
, proving (2).
It follows from Theorem 5.2.8 that if is a subspace of
, then
is one of the integers
, and that:
The other subspaces of are called proper. The following example uses Theorem 5.2.8 to show that the proper subspaces of
are the lines through the origin, while the proper subspaces of
are the lines and planes through the origin.
Example 5.2.14
- If
is a subspace of
or
, then
if and only if
is a line through the origin.
- If
is a subspace of
, then
if and only if
is a plane through the origin.
Solution:
- Since
, let
be a basis of
. Then
, so
is the line through the origin with direction vector
. Conversely each line
with direction vector
has the form
. Hence
is a basis of
, so
has dimension 1.
- If
has dimension 2, let
be a basis of
. Then
and
are not parallel (by Example 5.2.7) so
. Let
in
denote the plane through the origin with normal
. Then
is a subspace of
(Example 5.1.1) and both
and
lie in
(they are orthogonal to
), so
=
by Theorem 5.1.1. Hence
Since and
, it follows from Theorem 5.2.8 that
or
, whence
or
. But
(for example,
is not in
) and so
is a plane through the origin.
Conversely, if is a plane through the origin, then
,
,
, or
by Theorem 5.2.8. But
or
because
and
, and
by (1). So
.
5.3 Orthogonality
Length and orthogonality are basic concepts in geometry and, in and
, they both can be defined using the dot product. In this section we extend the dot product to vectors in
, and so endow
with euclidean geometry. We then introduce the idea of an orthogonal basis—one of the most useful concepts in linear algebra, and begin exploring some of its applications.
Dot Product, Length, and Distance
If and
are two
-tuples in
, recall that their dot product was defined in Section 2.2 as follows:
Observe that if and
are written as columns then
is a matrix product (and
if they are written as rows). Here
is a
matrix, which we take to be a number.
Definition Length in
As in , the length of the vector is defined by
Where indicates the positive square root.
A vector of length
is called a unit vector. If
, then
and it follows easily that
is a unit vector (see Theorem 5.3.6 below), a fact that we shall use later.
Example 5.3.1
If and
in
, then
and
. Hence
is a unit vector; similarly
is a unit vector.
Theorem 5.3.1
Let ,
, and
denote vectors in
. Then:
.
.
for all scalars
.
.
, and
if and only if
.
for all scalars
.
Proof:
(1), (2), and (3) follow from matrix arithmetic because ; (4) is clear from the definition; and (6) is a routine verification since
. If
, then
so
if and only if
. Since each
is a real number this happens if and only if
for each
; that is, if and only if
. This proves (5).
Because of Theorem 5.3.1, computations with dot products in are similar to those in
. In particular, the dot product
equals the sum of terms,
, one for each choice of
and
. For example:
holds for all vectors and
.
Example 5.3.2
Show that for any
and
in
.
Solution:
Using Theorem 5.3.1 several times:
Example 5.3.3
Suppose that for some vectors
. If
for each
where
is in
, show that
.
Solution:
We show by showing that
and using (5) of Theorem 5.3.1. Since the
span
, write
where the
are in
. Then
We saw in Section 4.2 that if and
are nonzero vectors in
, then
where
is the angle between
and
. Since
for any angle
, this shows that
. In this form the result holds in
.
Theorem 5.3.2 Cauchy Inequality
If and
are vectors in
, then
Moreover if and only if one of
and
is a multiple of the other.
Proof:
The inequality holds if or
(in fact it is equality). Otherwise, write
and
for convenience. A computation like that preceding Example 5.3.2 gives
(5.1)
It follows that and
, and hence that
. Hence
, proving the Cauchy inequality.
If equality holds, then , so
or
. Hence Equation (5.1) shows that
or
, so one of
and
is a multiple of the other (even if
or
).
There is an important consequence of the Cauchy inequality. Given and
in
, use Example 5.3.2 and the fact that
to compute
Taking positive square roots gives:
Corollary 5.3.1 Triangle Inequality
If and
are vectors in
, then
.
The reason for the name comes from the observation that in the inequality asserts that the sum of the lengths of two sides of a triangle is not less than the length of the third side.
Definition 5.7 Distance in
If and
are two vectors in
, we define the distance
between
and
by
Theorem 5.3.3
If ,
, and
are three vectors in
we have:
for all
and
.
if and only if
.
for all
and
.
for all
,
, and
. \quad Triangle inequality.
Proof:
(1) and (2) restate part (5) of Theorem 5.3.1 because , and (3) follows because
for every vector
in
. To prove (4) use the Corollary to Theorem 5.3.2:
Orthogonal Sets and the Expansion Theorem
Definition 5.8 Orthogonal and Orthonormal Sets
We say that two vectors and
in
are orthogonal if
, extending the terminology in
(See Theorem 4.2.3). More generally, a set
of vectors in
is called an orthogonal set if
Note that is an orthogonal set if
. A set
of vectors in
is called orthonormal if it is orthogonal and, in addition, each
is a unit vector:
Example 5.3.4
The standard basis is an orthonormal set in
.
Example 5.3.5
If is orthogonal, so also is
for any nonzero scalars
.
If , it follows from item (6) of Theorem 5.3.1 that
is a unit vector, that is it has length
.
Definition 5.9 Normalizing an Orthogonal Set
If is an orthogonal set, then
is an orthonormal set, and we say that it is the result of normalizing the orthogonal set
.
Example 5.3.6
If ,
,
, and
then is an orthogonal set in
as is easily verified. After normalizing, the corresponding orthonormal set is
The most important result about orthogonality is Pythagoras’ theorem. Given orthogonal vectors and
in
, it asserts that
. In this form the result holds for any orthogonal set in .
Theorem 5.3.4 Pythagoras’ Theorem
If is an orthogonal set in
, then
Proof:
The fact that whenever
gives
This is what we wanted.
If and
are orthogonal, nonzero vectors in
, then they are certainly not parallel, and so are linearly independent Example 5.2.7. The next theorem gives a far-reaching extension of this observation.
Theorem 5.3.5
Every orthogonal set in is linearly independent.
Proof:
Let be an orthogonal set in
and suppose a linear combination vanishes, say:
. Then
Since , this implies that
. Similarly
for each
.
Theorem 5.3.6
Let be an orthogonal basis of a subspace U of
. If
is any vector in
, we have
Proof:
Since spans
, we have
where the
are scalars. To find
we take the dot product of both sides with
:
Since , this gives
. Similarly,
for each
.
The expansion in Theorem 5.3.6 of as a linear combination of the orthogonal basis
is called the Fourier expansion of
, and the coefficients
are called the Fourier coefficients. Note that if
is actually orthonormal, then
for each
.
Example 5.3.7
Expand as a linear combination of the orthogonal basis
of
given in Example 5.3.6.
Solution:
We have ,
,
, and
so the Fourier coefficients are
The reader can verify that indeed .
5.4 Rank of a Matrix
In this section we use the concept of dimension to clarify the definition of the rank of a matrix given in Section 1.2, and to study its properties. This requires that we deal with rows and columns in the same way. While it has been the custom to write the -tuples as columns, in this section we will frequently write them as rows. Subspaces, independence, spanning, and dimension are defined for rows using matrix operations, just as for columns. If
is an
matrix, we define:
Definition 5.10 Column and Row Space of a Matrix
The column space, , of
is the subspace of
spanned by the columns of
.
The row space, , of
is the subspace of
spanned by the rows of
.
Lemma 5.4.1
Let and
denote
matrices.
- If
by elementary row operations, then
.
- If
by elementary column operations, then
.
Proof:
We prove (1); the proof of (2) is analogous. It is enough to do it in the case when by a single row operation. Let
denote the rows of
. The row operation
either interchanges two rows, multiplies a row by a nonzero constant, or adds a multiple of a row to a different row. We leave the first two cases to the reader. In the last case, suppose that
times row
is added to row
where
. Then the rows of
are
, and Theorem 5.1.1 shows that
That is, .
If is any matrix, we can carry
by elementary row operations where
is a row-echelon matrix. Hence
by Lemma 5.4.1; so the first part of the following result is of interest.
Lemma 5.4.2
If is a row-echelon matrix, then
- The nonzero rows of
are a basis of
.
- The columns of
containing leading ones are a basis of
.
Proof:
The rows of are independent, and they span
by definition. This proves (1).
Let denote the columns of
containing leading
s. Then
is independent because the leading
s are in different rows (and have zeros below and to the left of them). Let
denote the subspace of all columns in
in which the last
entries are zero. Then
(it is just
with extra zeros). Hence the independent set
is a basis of
by Theorem 5.2.7. Since each
is in
, it follows that
, proving (2).
Let be any matrix and suppose
is carried to some row-echelon matrix
by row operations. Note that
is not unique. In Section 1.2 we defined the rank of
, denoted
, to be the number of leading
s in
, that is the number of nonzero rows of
. The fact that this number does not depend on the choice of
was not proved. However part 1 of Lemma 5.4.2 shows that
and hence that is independent of
.
Lemma 5.4.2 can be used to find bases of subspaces of (written as rows). Here is an example.
Example 5.4.1
Find a basis of .
Solution:
is the row space of
. This matrix has row-echelon form
, so
is basis of
by Lemma 5.4.1.
Note that is another basis that avoids fractions.
Theorem 5.4.1 Rank Theorem
Let denote any
matrix of rank
. Then
Moreover, if is carried to a row-echelon matrix
by row operations, then
- The
nonzero rows of
are a basis of
.
- If the leading
s lie in columns
of
, then columns
of
are a basis of
.
Proof:
We have by Lemma 5.4.1, so (1) follows from 5.4.2. Moreover,
for some invertible matrix
. Now write
where
are the columns of
. Then
Thus, in the notation of (2), the set is a basis of
by Lemma 5.4.2. So, to prove (2) and the fact that
, it is enough to show that
is a basis of
. First,
is linearly independent because
is invertible (verify), so we show that, for each
, column
is a linear combination of the
. But
is column
of
, and so is a linear combination of the
, say
where each
is a real number.
Since is invertible, it follows that
and the proof is complete.
Example 5.4.2
Compute the rank of and find bases for
and
.
Solution:
The reduction of to row-echelon form is as follows:
Hence = 2, and
is a basis of by 5.4.2. Since the leading
s are in columns 1 and 3 of the row-echelon matrix, Theorem 5.4.1 shows that columns 1 and 3 of
are a basis
of
.
Corollary 5.4.1
If is any matrix, then
.
If is an
matrix, we have
and
. Hence Theorem 5.2.8 shows that
and
. Thus Theorem 5.4.1 gives:
Corollary 5.4.2
If is an
matrix, then
and
.
Corollary 5.4.3
whenever
and
are invertible.
Proof:
Lemma 5.4.1 gives . Using this and Corollary 5.4.1 we get
Lemma 5.4.3
Let ,
, and
be matrices of sizes
,
,
and respectively.
, with equality if
for some
.
, with equality if
for some
.
Proof:
For (1), write where
is column
of
. Then we have
, and each
is in
by Definition 2.4. It follows that
. If
, we obtain
in the same way. This proves (1).
As to (2), we have by (1), from which
. If
, this is equality as in the proof of (1).
Corollary 5.4.4
If is
and
is
, then
and
.
Proof:
By Lemma 5.4.3 and
, so Theorem 5.4.1 applies.
In Section 5.1 we discussed two other subspaces associated with an matrix
: the null space
and the image space
Using rank, there are simple ways to find bases of these spaces. If has rank
, we have
by Example 5.1.8, so
. Hence Theorem 5.4.1 provides a method of finding a basis of
. This is recorded as part (2) of the following theorem.
Theorem 5.4.2
Let denote an
matrix of rank
. Then
- The
basic solutions to the system
provided by the gaussian algorithm are a basis of
, so
.
- Theorem 5.4.1 provides a basis of
, and
.
Proof:
It remains to prove (1). We already know (Theorem 2.2.1) that is spanned by the
basic solutions of
. Hence using Theorem 5.2.7, it suffices to show that
. So let
be a basis of
, and extend it to a basis
of
(by Theorem 5.2.6). It is enough to show that
is a basis of
; then
by the above and so
as required.
Spanning. Choose in
,
in
, and write
where the
are in
.
Then because
.
Independence. Let ,
in
. Then
is in
, so
for some
in
. But then the independence of the
shows that
for every
.
Example 5.4.3
If , find bases of
and
, and so find their dimensions.
Solution:
If is in
, then
, so
is given by solving the system
. The reduction of the augmented matrix to reduced form is
Hence . Here,
has basis
by Theorem 5.4.1 because the leading
s are in columns 1 and 3. In particular,
as in Theorem 5.4.2.
Turning to , we use gaussian elimination. The leading variables are
and
, so the nonleading variables become parameters:
and
. It follows from the reduced matrix that
and
, so the general solution is
Hence . But
and
are solutions (basic), so
However Theorem 5.4.2 asserts that is a basis of
. (In fact it is easy to verify directly that
is independent in this case.) In particular,
.
Let be an
matrix. Corollary 5.4.2 asserts that
and
, and it is natural to ask when these extreme cases arise. If
are the columns of
, Theorem 5.2.2 shows that
spans
if and only if the system
is consistent for every
in
, and that
is independent if and only if
,
in
, implies
. The next two useful theorems improve on both these results, and relate them to when the rank of
is
or
.
Theorem 5.4.3
The following are equivalent for an matrix
:
.
- The rows of
span
.
- The columns of
are linearly independent in
.
- The
matrix
is invertible.
for some
matrix
.
- If
,
in
, then
.
Proof:
(1) (2). We have
, and
by (1), so
by Theorem 5.2.8. This is (2).
(2) (3). By (2),
, so
. This means
. Since the
columns of
span
, they are independent by Theorem 5.2.7.
(3) (4). If
,
in
, we show that
(Theorem 2.4.5). We have
Hence , so
by (3) and Theorem 5.2.2.
(4) (5). Given (4), take
.
(5) (6). If
, then left multiplication by
(from (5)) gives
.
(6) (1). Given (6), the columns of
are independent by Theorem 5.2.2. Hence
, and (1) follows.
Theorem 5.4.4
The following are equivalent for an matrix
:
.
- The columns of
span
.
- The rows of
are linearly independent in
.
- The
matrix
is invertible.
for some
matrix
.
- The system
is consistent for every
in
.
Proof:
(1) (2). By (1),
, so
by Theorem 5.2.8.
(2) (3). By (2),
, so
. This means
. Since the
rows of
span
, they are independent by Theorem 5.2.7.
(3) (4). We have
by (3), so the
matrix
has rank
. Hence applying Theorem 5.4.3 to
in place of
shows that
is invertible, proving (4).
(4) (5). Given (4), take
in (5).
(5) (6). Comparing columns in
gives
for each
, where
and
denote column
of
and
respectively. Given
in
, write
,
in
. Then
holds with
as the reader can verify.
(6) (1). Given (6), the columns of
span
by Theorem 5.2.2. Thus
and (1) follows.
5.5 Similarity and Diagonalization
Similar Matrices
Definition 5.11 Similar Matrices
If and
are
matrices, we say that
and
are similar, and write
, if
for some invertible matrix
.
Note that if and only if
where
is invertible (write
). The language of similarity is used throughout linear algebra. For example, a matrix
is diagonalizable if and only if it is similar to a diagonal matrix.
If , then necessarily
. To see why, suppose that
. Then
where
is invertible. This proves the second of the following properties of similarity:
(5.2)
These properties are often expressed by saying that the similarity relation is an equivalence relation on the set of
matrices. Here is an example showing how these properties are used.
Example 5.5.1
If is similar to
and either
or
is diagonalizable, show that the other is also diagonalizable.
Solution:
We have . Suppose that
is diagonalizable, say
where
is diagonal. Since
by (2) of (5.2), we have
and
. Hence
by (3) of (5.2), so
is diagonalizable too. An analogous argument works if we assume instead that
is diagonalizable.
Similarity is compatible with inverses, transposes, and powers:
Definition 5.12 Trace of a matrix
The trace of an
matrix
is defined to be the sum of the main diagonal elements of
.
In other words:
It is evident that and that
holds for all
matrices
and
and all scalars
. The following fact is more surprising.
Lemma 5.5.1
Let and
be
matrices. Then
.
Proof:
Write and
. For each
, the
-entry
of the matrix
is given as follows:
. Hence
Similarly we have . Since these two double sums are the same, Lemma 5.5.1 is proved.
Theorem 5.5.1
If and
are similar
matrices, then
and
have the same determinant, rank, trace, characteristic polynomial, and eigenvalues.
Proof:
Let for some invertible matrix
. Then we have
Similarly, by Corollary 5.4.2. Next Lemma 5.5.1 gives
As to the characteristic polynomial,
Finally, this shows that and
have the same eigenvalues because the eigenvalues of a matrix are the roots of its characteristic polynomial.
Example 5.5.2
Sharing the five properties in Theorem 5.5.1 does not guarantee that two matrices are similar. The matrices
and
have the same determinant, rank, trace, characteristic polynomial, and eigenvalues, but they are not similar because
for any invertible matrix
.
Diagonalization Revisited
Recall that a square matrix is diagonalizable if there exists an invertible matrix
such that
is a diagonal matrix, that is if
is similar to a diagonal matrix
\index{diagonal matrices}. Unfortunately, not all matrices are diagonalizable, for example
. Determining whether
is diagonalizable is closely related to the eigenvalues and eigenvectors of
. Recall that a number
is called an eigenvalue of
if
for some nonzero column
in
, and any such nonzero vector
is called an eigenvector of
corresponding to
(or simply a
-eigenvector of
). The eigenvalues and eigenvectors of
are closely related to the characteristic polynomial
of
, defined by
If is
this is a polynomial of degree
, and its relationship to the eigenvalues is given in the following theorem.
Theorem 5.5.2
Let be an
matrix.
- The eigenvalues
of
are the roots of the characteristic polynomial
of
.
- The
-eigenvectors
are the nonzero solutions to the homogeneous system
of linear equations with as coefficient matrix.
The next theorem will show us the condition when a square matrix is diagonalizable.
Theorem 5.5.3
Let be an
matrix.
is diagonalizable if and only if
has a basis
consisting of eigenvectors of
.
- When this is the case, the matrix
is invertible and
where, for each
,
is the eigenvalue of
corresponding to
.
The next result is a basic tool for determining when a matrix is diagonalizable. It reveals an important connection between eigenvalues and linear independence: Eigenvectors corresponding to distinct eigenvalues are necessarily linearly independent.
Theorem 5.5.4
Let be eigenvectors corresponding to distinct eigenvalues
of an
matrix
. Then
is a linearly independent set.
Theorem 5.5.5
If is an
matrix with n distinct eigenvalues, then
is diagonalizable
Example 5.5.4
Show that is diagonalizable.
Solution:
A routine computation shows that and so has distinct eigenvalues
,
, and
. Hence Theorem 5.5.5 applies.
Definition 5.1.3 Eigenspace of a Matrix
If is an eigenvalue of an
matrix
, define the eigenspace of
corresponding to
by
This is a subspace of and the eigenvectors corresponding to
are just the nonzero vectors in
. In fact
is the null space of the matrix
:
The basic solutions of the homogeneous system given by the gaussian algorithm form a basis for
. In particular
(5.5)
Now recall that the multiplicity of an eigenvalue of
is the number of times
occurs as a root of the characteristic polynomial
of
. In other words, the multiplicity of
is the largest integer
such that
for some polynomial . Because of (5.5), a square matrix is diagonalizable if and only if the multiplicity of each eigenvalue
equals
. We are going to prove this, and the proof requires the following result which is valid for any square matrix, diagonalizable or not.
Lemma 5.5.3
Let be an eigenvalue of multiplicity
of a square matrix
. Then
.
It turns out that this characterizes the diagonalizable matrices
for which
factors completely over
. By this we mean that
, where the
are real numbers (not necessarily distinct); in other words, every eigenvalue of
is real. This need not happen (consider
), which leads us to the general conclusion regarding when a square matrix is diagonalizable.
Theorem 5.5.6
The following are equivalent for a square matrix for which
factors completely.
is diagonalizable.
equals the multiplicity of
for every eigenvalue
of the matrix
Example 5.5.5
If and
show that
is diagonalizable but
is not.
Solution:
We have so the eigenvalues are
and
. The corresponding eigenspaces are
and
where
as the reader can verify. Since is independent, we have
which is the multiplicity of
. Similarly,
equals the multiplicity of
. Hence
is diagonalizable
by 5.5.6, and a diagonalizing matrix is .
Turning to ,
so the eigenvalues are
and
. The corresponding eigenspaces are
and
where
Here is smaller than the multiplicity of
, so the matrix
is not diagonalizable, again by Theorem 5.5.6. The fact that
means that there is no possibility of finding three linearly independent eigenvectors.