Singular value theorem for linear transformations

3 minute read

Singular value theorem for linear transformations, or singular value decomposition (SVD) theorem for matrices, is widely used in various areas. We will explain the theorem as well as its proof.

Motivation

Let \(T\colon V\to V\) be a linear operator on a finite-dimensional vector space \(V\) over \(F=\mathbf{R}\) or \(\mathbf{C}\). We have seen that

  • for \(F=\mathbf{C}\), if \(T\) is normal, then there exists an orthonormal basis for \(V\) consisting of eigenvectors of \(T\);
  • for \(F=\mathbf{R}\), if \(T\) is self-adjoint, then there exists an orthonormal basis for \(V\) consisting of eigenvectors of \(T\).

We can interprete the above statements as follows. The eigenvalues of \(T\) measure how much the eigenvectors in the corresponding orthonormal basis are distorted, that is,

\[T(v)=\lambda v.\]

Our goal is to generalize the above statements to general linear maps \(T\colon V\to W\) between two possibly different inner product spaces. Of course, in this situation, it does not make sense to talk about eigenvalues and eigenvectors. But still, we can measure how much the vectors in two different orthonormal bases (one for \(V\) and another for \(W\)) are distorted.

Implementing the idea

For this purpose, we need to search for nice orthonormal bases \(\beta\) for \(V\) and \(\gamma\) for \(W\), and, in the meantime, hoping that \(T\) takes vectors in \(\beta\) to vectors in \(\gamma\). Suppose \(\beta=\{v_{1},\ldots,v_{n}\}\) and \(\gamma=\{w_{1},\ldots,w_{m}\}\). At the very least, we would like to demand that

\[\langle T(v_{i}),T(w_{j})\rangle_{W}=\delta_{ij}.\]

In other words,

\[\langle T^{\ast}T(v_{i}),v_{j}\rangle_{V}=\delta_{ij}.\]

Note that \(T^{\ast}T\) is self-adjoint and positive semidefinite. We can thus simply pick \(\beta\) to be the orthonormal basis for \(V\) consisting of eigenvectors of \(T^{\ast}T\). Write \(\beta=\{v_{1},\ldots,v_{n}\}\). We can arrange \(v_{i}\) according to the magnitude of the corresponding eigenvalue \(\lambda_{i}\); that is, we assume that

\[\lambda_{1}\ge\lambda_{2}\ge\cdots \ge\lambda_{n-1}\ge\lambda_{n}.\]

Now let \(r=\mathrm{rank}(T)\). Then \(r=\mathrm{rank}(T^{\ast}T)\) and

\[\lambda_{1}\ge\lambda_{2}\ge\cdots \ge\lambda_{r}>\lambda_{r+1}=\cdots=\lambda_{n}=0.\]

This implies that \(T(v_{i})\) is non-zero for \(1\le i\le r\). Moreover,

\[\langle T(v_{i}),T(w_{j})\rangle_{W}=\langle T^{\ast}T(v_{i}),v_{j}\rangle_{V}=\lambda_{i}\delta_{ij}.\]

If we set \(w_{i}:=T(v_{i})/\sqrt{\lambda_{i}}\) for \(1\le i\le r\), then

\[\langle w_{i},w_{j}\rangle_{W}=\delta_{ij}~\mbox{for}~1\le i,j\le r.\]

Said differently, the set \(\{w_{1},\ldots,w_{r}\}\) is an orthonormal subset of \(W\), and we can extend it to an orthonormal basis

\[\gamma=\{w_{1},\ldots,w_{r},w_{r+1},\ldots,w_{m}\}\]

for \(W\). We then arrive at the following theorem.

Theorem. Let \(V\) and \(W\) be finite-dimensional inner product spaces and \(T\colon V\to W\) be a linear map of rank \(r\). Then there exist an orthonormal basis \(\beta=\{v_{1},\ldots,v_{n}\}\) for \(V\), an orthonormal basis \(\gamma=\{w_{1},\ldots,w_{m}\}\) for \(W\), and positive scalars \(\sigma_{1}\ge\cdots\ge\sigma_{r}>0\) such that

\[T(v_{i})=\begin{cases} \sigma_{i}w_{i},~&\mbox{for}~1\le i\le r\\ \mathbf{0},~&\mbox{otherwise}. \end{cases}\]

Conversely, suppose the previous conditions are satisfied. Then for \(1\le i\le n\) the vector \(v_{i}\) is an eigenvector of \(T^{\ast}T\) with corresponding eigenvalue \(\sigma_{i}^{2}\) for \(1\le i\le r\), and \(0\) for \(r+1\le i\le n\). Therefore, the scalars \(\sigma_{i}\)’s are uniquely determined by \(T\).

Proof. The first part is proven for \(\sigma_{i}=\sqrt{\lambda_{i}}\). We shall also prove the last statement. Compute

\[\langle T^{\ast}(w_{i}),v_{j}\rangle_{V}=\langle w_{i},T(v_{j})\rangle_{W} =\begin{cases} \sigma_{i}\delta_{ij},~&\mbox{for}~1\le i\le r\\ \mathbf{0},~&\mbox{otherwise}. \end{cases}\]

Hence we have

\[T^{\ast}(w_{i})=\begin{cases} \sigma_{i}v_{i},~&\mbox{for}~1\le i\le r\\ \mathbf{0},~&\mbox{otherwise}. \end{cases}\]

We then infer that for \(1\le i\le r\)

\[T^{\ast}T(v_{i})=\sigma_{i}^{2}v_{i}\]

and \(T^{\ast}T(v_{i})=\mathbf{0}\) for \(i>r\).

Definition. The scalars \(\sigma_{1},\ldots,\sigma_{r}\) are called the singular values of \(T\). If \(r<k:=\min\{m,n\}\), then the term “singular value” is extended to include \(\sigma_{r+1}=\cdots =\sigma_{k}=0\).

Remark. From the proof, we see that the singulars of \(T\) and the singular values of its adjoint are identical. Indeed, we have seen that

\[T^{\ast}(w_{i})=\begin{cases} \sigma_{i}v_{i},~&\mbox{for}~1\le i\le r\\ \mathbf{0},~&\mbox{otherwise}. \end{cases}\]