"

Hessian of a Function

Definition

The Hessian of a twice-differentiable function f: \mathbb{R}^n \rightarrow \mathbb{R} at a point x\in {\bf dom} f is the matrix containing the second derivatives of the function at that point. That is, the Hessian is the matrix with elements given by

H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}(x),\quad 1\leq i,j \leq n.

The Hessian of f at x is often denoted \nabla^2 f(x).

The second derivative is independent of the order in which derivatives are taken. Hence, H_{ij} = H_{ji} for every pair (i,j). Thus, the Hessian is a symmetric matrix.

Examples

Hessian of a quadratic function

Consider the quadratic function

q(x) = x_1^2 + 2x_1 x_2 + 3x_2^2 + 4x_1 + 5x_2 +6

The Hessian of q at x is given by

\frac{\partial^2 q}{\partial x_i \partial x_j}(x)=\left(\begin{array}{cc} \frac{\partial^2 q}{\partial x_1^2}(x) & \frac{\partial^2 q}{\partial x_1 \partial x_2}(x) \\ \frac{\partial^2 q}{\partial x_2 \partial x_1}(x) & \frac{\partial^2 q}{\partial x_2^2}(x) \end{array}\right)=\left(\begin{array}{ll} 2 & 2 \\ 2 & 6 \end{array}\right) \text {. }

For quadratic functions, the Hessian is a constant matrix, that is, it does not depend on the point at which it is evaluated.

Hessian of the log-sum-exp function

Consider the ‘‘log-sum-exp’’ function lse: \mathbb{R}^2 \rightarrow \mathbb{R}, with values

lse(x):= \log(e^{x_1}+e^{x_2}).

The gradient of lse at x is

\nabla lse(x) = \frac{1}{z_1 + z_2}\left(\begin{array}{c} z_1 \\ z_2 \end{array}\right).

where z_i: = e^{x_i}, i=1,2. The Hessian is given by

\nabla^2 lse(x) = \frac{z_1 z_2}{(z_1 +z_2)^2}\left(\begin{array}{cc} 1 & -1 \\-1 & 1 \end{array}\right)

More generally, the Hessian of the function f: \mathbb{R}^n \rightarrow \mathbb{R} with values

lse(x):= \log\sum\limits_{i=1}^{n} \left(e^{x_i}\right).

is as follows.

  • First the gradient at a point x is (see here):
\nabla lse(x) = \frac{1}{\sum_{i=1}^n e^{x_i}}\left(\begin{array}{c} e^{x_1} \\\cdots\\ e^{x_n} \end{array}\right) = \frac{1}{Z} z,

where z= (e^{x_1}, \cdots, e^{x_n}), and Z = \sum_{i=1}^n z_i.

  • Now the Hessian at a point x is obtained by taking derivatives of each component of the gradient. If g_i(x) is the i-th component, that is,
g_i(x) = \frac{e^{x_i}}{\sum_{i=1}^n e^{x_i}} = \frac{z_i}{Z}

then

\frac{\partial g_i(x)}{\partial x_i} = \frac{z_i}{Z} - \frac{z_i^2}{Z^2},

and, for j \neq i:

\frac{\partial g_i(x)}{\partial x_j} = -\frac{z_i z_j}{Z^2}.

More compactly:

\nabla^2 lse(x) = \frac{1}{Z^2} (Z {\bf diag}(z) - zz^T).

License

Hyper-Textbook: Optimization Models and Applications Copyright © by L. El Ghaoui. All Rights Reserved.