Consider [latex]m[/latex] [latex]n[/latex]-vectors [latex]x_1, \cdots, x_m[/latex]. The Gram matrix of the collection is the [latex]m\times m[/latex] matrix [latex]G[/latex] with elements [latex]G_{ij}=x_i^Tx_j[/latex]. The matrix can be expressed compactly in terms of the matrix [latex]X = [x_1, \cdots, x_m][/latex], as
[latex]\begin{align*} G &= X^T X = \left(\begin{array}{c} x_1^T \\ \vdots \\ x_m^T \end{array}\right) \left(\begin{array}{lll} x_1 & \ldots & x_m \end{array}\right). \end{align*}[/latex]
By construction, a Gram matrix is always symmetric, meaning that [latex]G_{ij} = G_{ji}[/latex] for every pair [latex](i,j)[/latex]. It is also positive semi-definite, meaning that [latex]u^TGu \ge 0[/latex] for every vector [latex]u \in \mathbb{R}^n[/latex] (this comes from the identity [latex]u^TGu = ||Xu||_2^2[/latex]).
Assume that each vector [latex]x_i[/latex] is normalized: [latex]||x_i||_2 =1[/latex]. Then the coefficient [latex]G_{ij}[/latex] can be expressed as
[latex]\begin{align*} G_{ij} &= \cos \theta_{ij}, \end{align*}[/latex]
where [latex]\theta_{ij}[/latex] is the angle between the vectors [latex]x_i[/latex] and [latex]x_j[/latex]. Thus [latex]G_{ij}[/latex] is a measure of how similar [latex]x_i[/latex] and [latex]x_j[/latex] are.
The matrix [latex]G[/latex] arises for example in text document classification, with [latex]G_{ij}[/latex] a measure of similarity between the [latex]i-[/latex]th and [latex]j-[/latex]th document, and [latex]x_i, x_j[/latex] their respective bag-of-words representation (normalized to have Euclidean norm 1).
See also: