\(
\newcommand\norm[1]{|| #1 ||}
\newcommand\nnz[1]{{\textrm{nnz}(#1)}}
\newcommand\E{\mathbf{E}}
\newcommand\Dist{\mathbf{Dist}}
\newcommand\reals{{\mathrm{I\!R}}}
\newcommand\R{{\mathrm{I\!R}}}
\newcommand\poly{{\textrm{poly}}}
\newcommand\rank{{\textrm{rank}}}
\newcommand\colspan{{\textrm{range}}}
\newcommand\diam{{\textrm{diam}}}
\newcommand\polylog{{\textrm{polylog}}}
\newcommand\rowspan{{\textrm{rowspan}}}
\newcommand\colbasis{{\textrm{colbasis}}}
\newcommand\argmin{{\mathrm{argmin}}}
\newcommand\Err{{\mathtt{NE}}}
\newcommand\eps{{\epsilon}}
\newcommand\bfone{{\mathbf{1}}}
\newcommand\Var{{\mathbf{Var}}}
\newcommand\stdev[1]{{\mathbf{stdev(#1)}}}
\newcommand\cgf{{\mathbf{CGF}}}
\newcommand\bfone{{\mathbb{1}}}
\newcommand\Iden{{\mathbf{I}}}
\newcommand\Prob{{\mathbf{Pr}}}
\DeclareMathOperator\subg{\mathbf{subG}}
\newcommand\cN{{\cal N}}
\newcommand\cS{{\cal S}}
\newcommand\cD{{\cal D}}
\newcommand\cM{{\cal M}}
\newcommand\cE{{\cal E}}
\newcommand\cC{{\cal C}}
\newcommand\KJL{K_{\mathbb{JL}}}
\newcommand\tO{{\tilde O}}
\newcommand\tx{{\tilde{\mathbf{x}}}}
\newcommand\tX{{\tilde{\mathbf{X}}}}
\newcommand\tY{{\tilde{\mathbf{Y}}}}
\newcommand\tA{{\tilde{\mathbf{A}}}}
\newcommand\bA{{\mathbf{A}}}
\newcommand\ba{{\mathbf{a}}}
\newcommand\bw{{\mathbf{w}}}
\newcommand\bW{{\mathbf{W}}}
\newcommand\bR{{\mathbf{R}}}
\newcommand\bB{{\mathbf{B}}}
\newcommand\bQ{{\mathbf{Q}}}
\newcommand\bZ{{\mathbf{Z}}}
\newcommand\bz{{\mathbf{z}}}
\newcommand\bU{{\mathbf{U}}}
\newcommand\bV{{\mathbf{V}}}
\newcommand\bX{{\mathbf{X}}}
\newcommand\bS{{\mathbf{S}}}
\newcommand\hS{\hat {\mathbf{S}}}
\newcommand\bs{{\mathbf{s}}}
\newcommand\bx{{\mathbf{x}}}
\newcommand\by{{\mathbf{y}}}
\newcommand\bb{{\mathbf{b}}}
\newcommand\bT{{\mathbf{T}}}
\newcommand\bP{{\mathbf{P}}}
\newcommand\bG{{\mathbf{G}}}
\newcommand\bY{{\mathbf{Y}}}
\newcommand\hY{\hat {\mathbf{Y}}}
\newcommand\bH{{\mathbf{H}}}
\newcommand\bD{{\mathbf{D}}}
\newcommand\be{{\mathbf{e}}}
\newcommand\br{{\mathbf{r}}}
\newcommand\bSig{{\mathbf{\Sigma}}}
\newcommand\bPi{{\mathbf{\Pi}}}
\newcommand\hphi{{\hat\phi}}
\)
Sketching for Matrix Computations:
Low-rank Approximation
Ken Clarkson
IBM Research, Almaden
- These slides at
http://kenclarkson.org/G2S3_2015/
- Enable javascript, navigate as in powerpoint
- Math may take some time to render properly; if it doesn't end up looking right, try reloading
Low-rank approximation: bi-criteria
Let
- `\bA\in\R^{n\times d}`
- integer `k\ge 1`, `\eps\in (0,1)`
- `\Delta_k\equiv \min_{\rank(\bY)=k} \norm{\bA-\bY}_F`
There are sketching matrices
- `\bS\in\R^{m\times n}`, `\hS\in\R^{d\times m'}` with
- `m,m'=\poly(k/\eps)`
such that with constant probability,
$$\norm{\bA\hS(\bS \bA \hS)^+\bS\bA - \bA}_F\le (1+\eps)\Delta_k,$$
computable in `O(\nnz{A}) + \poly(k/\eps)` time.
- Rank `m=O(k(\eps^{-1}+k^\gamma))`, can be smaller via other embeddings
- Analysis via extension of regression results, to follow
- A "bi-criteria" result: both rank and error are larger than their targets
- One pass over the data
Low-rank approximation: rank `k`
More work to get rank-`k` with the same sketches:
Let
- `\bA\in\R^{n\times d}`
- integer `k\ge 1`, `\eps\in (0,1)`
- `\Delta_k\equiv \min_{\rank(\bY)=k} \norm{\bA-\bY}_F`
There are sketching matrices
- `\bS\in\R^{m\times n}`, `\hS\in\R^{d\times m'}` with
- `m,m'=\poly(k/\eps)`
such that with constant probability,
$$\tY \equiv \min_{\rank(\bY)=k}\norm{\bY\bS\bA\hS - \bA\hS}_F$$
has
$$\norm{\tY\bS\bA - \bA}_F \le (1+\eps)\Delta_k.$$
Here `\tY` is computable in `O(\nnz{A}) + n\cdot\poly(k/\eps)` time.
- Without the rank constraint, opt `\bY` is `\bA\hS(\bS \bA \hS)^+`
- `n` in running time can be `\min\{n,d\}`
Low-rank approximation
We reduce to easier problems, up to `O(\eps)` relative error:
$$
\begin{align*}
\min_{\rank(\bY_1)=k} & \norm{\bY_1 - \bA}_F & & \text{ }
\\ & \longrightarrow \min_{\rank(\bY_2)=k} \norm{\bY_2\bS\bA - \bA}_F & & \text{"}\bS\bA \text{ is good"}
\\ & \longrightarrow \min_{\rank(\bY_3)=k} \norm{\bY_3\bS\bA\hS - \bA\hS}_F & & \text{sketching on the right}
\\ & \longrightarrow \min_{\rank(\bY_4)=k} \norm{\bY_4 - \bA\hS\bV}_F & & \text{best rank-}k\text{ of proj.}
\end{align*}
$$
- The `\bV` in the last problem is from `\bS\bA\hS=\bU\bSig\bV^\top`
- Use: `\bY^*_2\bS\bA` as `\bY^*_1`, `\bY^*_3` as `\bY^*_2`, `\bY^*_4\bSig^{-1} \bU^\top` as `\bY^*_3`
- TBD: why these reductions work
- The first two involve sketches,
the last is "just linear algebra"
Low-rank approximation: `\bS\bA` is good
Sketchspace containing a rank-`k` approx
If
- `\bA\in\R^{n\times d}`
- integer `k\ge 1`, `\eps\in (0,1)`
- `\bA_k \equiv\argmin_{\rank(\bY)=k} \norm{\bY - \bA}_F = \bU_k\bSig_k\bV_k`
- `\bS ` with `\Err_\bS(\bW ,\bW ')\le\frac23\sqrt{\eps/k}`, for fixed `\bW ,\bW '`
- `\bS ` a `(1/9)`-embedding for `\bU_k`
Then
$$\min_{\rank(\bY)=k}\norm{\bY\bS\bA - \bA}_F\le (1+\eps)\Delta_k,$$
where `\Delta_k\equiv \norm{\bA-\bA_k}_F`.
Claim: `\tY = \bU_k(\bS\bU_k)^+` has `\tY\bS\bA` close enough to `\bA`
Low-rank approximation: `\tY` is good
It's enough to show that
`\tA= \tY\bS\bA`, where `\tY = \bU_k(\bS\bU_k)^+`, satisfies
$$\norm{\bA -\tA}_F\le (1+\eps)\norm{\bA -\bA_k}_F.$$
Another view of `\bA_k`: for
$$\bX _* = \argmin_{\bX\in\R^{k\times d}}\norm{\bU_k\bX -\bA }_F = \class{tied_highlightee1}{\bU_k^\top\bA},$$
`\bA_k = \bU_k\bX _* = \bU_k (\bU_k^\top\bA )`.
Sketched:
$$\tX = \argmin_{\bX\in\R^{k\times d}}\norm{\bS (\bU_k\bX -\bA )}_F = \class{tied_highlightee1}{(\bS\bU_k)^+\bS\bA}.$$
Using the multiple-response regression result:
$$
\begin{align*}
\norm{\tA-\bA }_F & = \norm{\bU_k\tX -\bA }_F\le (1+\eps)\norm{\bU_k\bX _* -\bA }_F
\\ & = (1+\eps)\norm{\bA_k-\bA }_F,
\end{align*}$$
as claimed.
Here `\tY\bS\bA = [\bU_k(\bS\bU_k)^+]\bS\bA = \bU_k[(\bS\bU_k)^+\bS\bA]= \bU_k\tX`
Low-rank approximation:
sketching on the right
So it is enough to find (as above)
\begin{equation}
\argmin_{\rank(\bY_2 )=k}\norm{\bY_2\bS\bA -\bA }_F,
\label{eq:foo2}
\end{equation}
but this is still a big problem.
There is sketching matrix `\hS`,
with the affine embedding property, such that
$$\argmin_{\rank(\bY_3 )=k}\norm{\bY_3\bS\bA\hS -\bA\hS}_F.$$
is an `\eps`-approx solution to `\eqref{eq:foo2}`.
(As above, related to the bi-criteria result: `\hS` such that
$$\bY = \bA\hS(\bS\bA\hS)^+\eps\text{-approx solves }\min_\bY\norm{\bY\bS\bA -\bA }_F,$$
so
$$\bA\hS(\bS\bA\hS)^+\bS\bA $$
is rank-`m`, `m= \poly(k/\eps)`, and as good as the best rank-`k` matrix.)
Affine embeddings
We will transpose to align with other definitions.
Given `\bA\in\R^{n\times d},\bB\in\R^{n\times d'}`, a matrix `\bT ` is an
affine embedding for `\bA ,\bB ` if
$$\norm{\bT (\bA\bX -\bB }_F^2\approx_\eps\norm{\bA\bX -\bB }_F^2$$
for all `\bX\in\R^{d\times d'}`.
Like a subspace embedding, but "translated by `\bB `".
We have:
If
- `\bA\in\R^{n\times d}` with `\rank(\bA )=r`, `\bB\in\R^{n\times d'}`
- `\bT ` with `\Err_\bS(\bW ,\bW ')\le K_1\eps/\sqrt{r}`, for fixed `\bW ,\bW '`, constant `K_1`
- `\bT ` a subspace `K_2\eps`-embedding for `\bA `, constant `K_2`
Then
`\bT ` is an affine embedding for `\bA ,\bB `,
when `K_1` and `K_2` are small enough.
Applying affine embeddings on the right
To apply the affine embedding result as needed for low-rank approx, map:
- `\bA` in the affine embedding theorem to `(\bS\bA)^\top`
- `\bB` to `\bA^\top`
- `\bX` to `\bY_2^\top`
- `\bT` to `\hS^\top`
- `r` to `\rank(\bS\bA) = \poly(k/\eps)\quad` (so `m'=\poly(k/\eps)`)
Affine embeddings:
- Also allow sketched multiple-response regression (but are overkill)
- Satisfy stronger conditions: approximation to `\norm{\bA\bX-\bB}_F` for all `\bX`
- Require stronger conditions
Finding the closest rank-`k`
It remains to solve (as above)
$$\argmin_{\rank(\bY_3 )=k}\norm{\bY_3\bS\bA\hS -\bA\hS}_F.$$
Here `\bS\bA\hS\in\R^{m\times m'}` where `m,m'=\poly(k/\eps)`.
For `\bS\bA\hS=\bU\bSig\bV^\top`, problem is
$$
\begin{align*}
\argmin_{\rank(\bY_3 )=k} & \norm{\bY_3\bU\bSig\bV^\top -\bA\hS}_F, \text{ or}
\\ \argmin_{\rank(\bY_4 )=k} & \norm{\bY_4\bV^\top -\bA\hS}_F,
\end{align*}
$$
where `\bY_3 = \bY_4\bSig^{-1}\bU^\top`.
Since `\bV` is orthonormal,
$$\norm{\bY_4\bV^\top -\bA\hS}_F = \norm{(\bY_4\bV^\top -\bA\hS)\bV}_F
= \norm{\bY_4 - \bA\hS\bV}_F,$$
where `\bA\hS\bV` can be computed in `n\cdot\poly(k/\eps)`,
yielding rank-`k` approximation problem on `\poly(k/\eps)\times \poly(k/\eps)`
matrices.