Hessian of a Function
Definition
The Hessian of a twice-differentiable function at a point
is the matrix containing the second derivatives of the function at that point. That is, the Hessian is the matrix with elements given by
data:image/s3,"s3://crabby-images/a4643/a4643d4bc2b30d4f7df659ce5a886bc2f21f35b9" alt="Rendered by QuickLaTeX.com H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}(x),\quad 1\leq i,j \leq n."
The Hessian of at
is often denoted
.
The second derivative is independent of the order in which derivatives are taken. Hence, for every pair
. Thus, the Hessian is a symmetric matrix.
Examples
Hessian of a quadratic function
Consider the quadratic function
data:image/s3,"s3://crabby-images/49594/4959407d7d4785cd869f460b477984ec28af50e8" alt="Rendered by QuickLaTeX.com q(x) = x_1^2 + 2x_1 x_2 + 3x_2^2 + 4x_1 + 5x_2 +6"
The Hessian of at
is given by
data:image/s3,"s3://crabby-images/4de00/4de00d0d2cf0c06951b9c982348992932f320965" alt="Rendered by QuickLaTeX.com \frac{\partial^2 q}{\partial x_i \partial x_j}(x)=\left(\begin{array}{cc} \frac{\partial^2 q}{\partial x_1^2}(x) & \frac{\partial^2 q}{\partial x_1 \partial x_2}(x) \\ \frac{\partial^2 q}{\partial x_2 \partial x_1}(x) & \frac{\partial^2 q}{\partial x_2^2}(x) \end{array}\right)=\left(\begin{array}{ll} 2 & 2 \\ 2 & 6 \end{array}\right) \text {. }"
For quadratic functions, the Hessian is a constant matrix, that is, it does not depend on the point at which it is evaluated.
Hessian of the log-sum-exp function
Consider the ‘‘log-sum-exp’’ function , with values
data:image/s3,"s3://crabby-images/4d9d3/4d9d360e72e32dfaeae254bd1ef47ed53ee98e5b" alt="Rendered by QuickLaTeX.com lse(x):= \log(e^{x_1}+e^{x_2})."
The gradient of at
is
data:image/s3,"s3://crabby-images/49af8/49af817e0455fa8fa4a0633dbf1a1e1df9309c1f" alt="Rendered by QuickLaTeX.com \nabla lse(x) = \frac{1}{z_1 + z_2}\left(\begin{array}{c} z_1 \\ z_2 \end{array}\right)."
where ,
. The Hessian is given by
data:image/s3,"s3://crabby-images/2f4a2/2f4a20037c7cbb353e3f1a67daac672efebab453" alt="Rendered by QuickLaTeX.com \nabla^2 lse(x) = \frac{z_1 z_2}{(z_1 +z_2)^2}\left(\begin{array}{cc} 1 & -1 \\-1 & 1 \end{array}\right)"
More generally, the Hessian of the function with values
data:image/s3,"s3://crabby-images/2c085/2c085930888d70c70059a07a58744621df33f3eb" alt="Rendered by QuickLaTeX.com lse(x):= \log\sum\limits_{i=1}^{n} \left(e^{x_i}\right)."
is as follows.
- First the gradient at a point
is (see here):
data:image/s3,"s3://crabby-images/39483/39483b51fea6f40aca50be02c1eb1abd04d6e948" alt="Rendered by QuickLaTeX.com \nabla lse(x) = \frac{1}{\sum_{i=1}^n e^{x_i}}\left(\begin{array}{c} e^{x_1} \\\cdots\\ e^{x_n} \end{array}\right) = \frac{1}{Z} z,"
where , and
.
- Now the Hessian at a point
is obtained by taking derivatives of each component of the gradient. If
is the
-th component, that is,
data:image/s3,"s3://crabby-images/21d27/21d27ab63767a88d777d2e195abab00b318d026a" alt="Rendered by QuickLaTeX.com g_i(x) = \frac{e^{x_i}}{\sum_{i=1}^n e^{x_i}} = \frac{z_i}{Z}"
then
data:image/s3,"s3://crabby-images/2b700/2b700b1a17bf0fbce1884293bd5d823a372b1662" alt="Rendered by QuickLaTeX.com \frac{\partial g_i(x)}{\partial x_i} = \frac{z_i}{Z} - \frac{z_i^2}{Z^2},"
and, for :
data:image/s3,"s3://crabby-images/4bee0/4bee0f6a235da5732a351eddc0696e6832cf5dce" alt="Rendered by QuickLaTeX.com \frac{\partial g_i(x)}{\partial x_j} = -\frac{z_i z_j}{Z^2}."
More compactly:
data:image/s3,"s3://crabby-images/b8fe3/b8fe34cad8e7bb6a7b2801a60a8a91c0446afb8a" alt="Rendered by QuickLaTeX.com \nabla^2 lse(x) = \frac{1}{Z^2} (Z {\bf diag}(z) - zz^T)."