Definition
The Hessian of a twice-differentiable function [latex]f: \mathbb{R}^n \rightarrow \mathbb{R}[/latex] at a point [latex]x\in {\bf dom} f[/latex] is the matrix containing the second derivatives of the function at that point. That is, the Hessian is the matrix with elements given by
[latex]\begin{align*} H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}(x),\quad 1\leq i,j \leq n. \end{align*}[/latex]
The Hessian of [latex]f[/latex] at [latex]x[/latex] is often denoted [latex]\nabla^2 f(x)[/latex].
The second derivative is independent of the order in which derivatives are taken. Hence, [latex]H_{ij} = H_{ji}[/latex] for every pair [latex](i,j)[/latex]. Thus, the Hessian is a symmetric matrix.
Examples
Hessian of a quadratic function
Consider the quadratic function
[latex]\begin{align*} q(x) = x_1^2 + 2x_1 x_2 + 3x_2^2 + 4x_1 + 5x_2 +6 \end{align*}[/latex]
The Hessian of [latex]q[/latex] at [latex]x[/latex] is given by
[latex]\begin{align*} \frac{\partial^2 q}{\partial x_i \partial x_j}(x) = \left(\begin{array}{cc} \dfrac{\partial^2 q}{\partial x_1^2}(x) & \dfrac{\partial^2 q}{\partial x_1 \partial x_2}(x) \\[3ex] \dfrac{\partial^2 q}{\partial x_2 \partial x_1}(x) & \dfrac{\partial^2 q}{\partial x_2^2}(x) \end{array}\right) = \left(\begin{array}{ll} 2 & 2 \\ 2 & 6 \end{array}\right) \text{. } \end{align*}[/latex]
For quadratic functions, the Hessian is a constant matrix, that is, it does not depend on the point at which it is evaluated.
Hessian of the log-sum-exp function
Consider the ‘‘log-sum-exp’’ function [latex]\mathrm{lse}: \mathbb{R}^2 \rightarrow \mathbb{R}[/latex], with values
[latex]\begin{align*} \mathrm{lse}(x):= \log(e^{x_1}+e^{x_2}). \end{align*}[/latex]
The gradient of [latex]\mathrm{lse}[/latex] at [latex]x[/latex] is
[latex]\begin{align*} \nabla \mathrm{lse}(x) = \frac{1}{z_1 + z_2}\left(\begin{array}{c} z_1 \\ z_2 \end{array}\right). \end{align*}[/latex]
where [latex]z_i: = e^{x_i}[/latex], [latex]i=1,2[/latex]. The Hessian is given by
[latex]\begin{align*} \nabla^2 \mathrm{lse}(x) = \frac{z_1 z_2}{(z_1 +z_2)^2}\left(\begin{array}{cc} 1 & -1 \\ -1 & 1 \end{array}\right) \end{align*}[/latex]
More generally, the Hessian of the function [latex]f: \mathbb{R}^n \rightarrow \mathbb{R}[/latex] with values
[latex]\begin{align*} \mathrm{lse}(x):= \log\sum\limits_{i=1}^{n} \left(e^{x_i}\right). \end{align*}[/latex]
is as follows.
● First the gradient at a point [latex]x[/latex] is (see here):
[latex]\begin{align*} \nabla \mathrm{lse}(x) = \frac{1}{\sum_{i=1}^n e^{x_i}}\left(\begin{array}{c} e^{x_1} \\ \cdots\\ e^{x_n} \end{array}\right) = \frac{1}{Z} z, \end{align*}[/latex]
where [latex]z=\left(\begin{array}{c} e^{x_1} \\ \cdots\\ e^{x_n} \end{array}\right)[/latex], and [latex]Z = \sum_{i=1}^n z_i[/latex].
● Now the Hessian at a point [latex]x[/latex] is obtained by taking derivatives of each component of the gradient. If [latex]g_i(x)[/latex] is the [latex]i[/latex]-th component, that is,
[latex]\begin{align*} g_i(x) = \frac{e^{x_i}}{\sum_{i=1}^n e^{x_i}} = \frac{z_i}{Z} \end{align*}[/latex]
then
[latex]\begin{align*} \frac{\partial g_i(x)}{\partial x_i} = \frac{z_i}{Z} - \frac{z_i^2}{Z^2}, \end{align*}[/latex]
and, for [latex]j \neq i[/latex]:
[latex]\begin{align*} \frac{\partial g_i(x)}{\partial x_j} = -\frac{z_i z_j}{Z^2}. \end{align*}[/latex]
More compactly:
[latex]\begin{align*} \nabla^2 \mathrm{lse}(x) = \frac{1}{Z^2} (Z {\bf diag}(z) - zz^T). \end{align*}[/latex]