Low-rank Approximation, G2S3 2015, Sketching Matrices

\( \newcommand\norm[1]{|| #1 ||} \newcommand\nnz[1]{{\textrm{nnz}(#1)}} \newcommand\E{\mathbf{E}} \newcommand\Dist{\mathbf{Dist}} \newcommand\reals{{\mathrm{I\!R}}} \newcommand\R{{\mathrm{I\!R}}} \newcommand\poly{{\textrm{poly}}} \newcommand\rank{{\textrm{rank}}} \newcommand\colspan{{\textrm{range}}} \newcommand\diam{{\textrm{diam}}} \newcommand\polylog{{\textrm{polylog}}} \newcommand\rowspan{{\textrm{rowspan}}} \newcommand\colbasis{{\textrm{colbasis}}} \newcommand\argmin{{\mathrm{argmin}}} \newcommand\Err{{\mathtt{NE}}} \newcommand\eps{{\epsilon}} \newcommand\bfone{{\mathbf{1}}} \newcommand\Var{{\mathbf{Var}}} \newcommand\stdev[1]{{\mathbf{stdev(#1)}}} \newcommand\cgf{{\mathbf{CGF}}} \newcommand\bfone{{\mathbb{1}}} \newcommand\Iden{{\mathbf{I}}} \newcommand\Prob{{\mathbf{Pr}}} \DeclareMathOperator\subg{\mathbf{subG}} \newcommand\cN{{\cal N}} \newcommand\cS{{\cal S}} \newcommand\cD{{\cal D}} \newcommand\cM{{\cal M}} \newcommand\cE{{\cal E}} \newcommand\cC{{\cal C}} \newcommand\KJL{K_{\mathbb{JL}}} \newcommand\tO{{\tilde O}} \newcommand\tx{{\tilde{\mathbf{x}}}} \newcommand\tX{{\tilde{\mathbf{X}}}} \newcommand\tY{{\tilde{\mathbf{Y}}}} \newcommand\tA{{\tilde{\mathbf{A}}}} \newcommand\bA{{\mathbf{A}}} \newcommand\ba{{\mathbf{a}}} \newcommand\bw{{\mathbf{w}}} \newcommand\bW{{\mathbf{W}}} \newcommand\bR{{\mathbf{R}}} \newcommand\bB{{\mathbf{B}}} \newcommand\bQ{{\mathbf{Q}}} \newcommand\bZ{{\mathbf{Z}}} \newcommand\bz{{\mathbf{z}}} \newcommand\bU{{\mathbf{U}}} \newcommand\bV{{\mathbf{V}}} \newcommand\bX{{\mathbf{X}}} \newcommand\bS{{\mathbf{S}}} \newcommand\hS{\hat {\mathbf{S}}} \newcommand\bs{{\mathbf{s}}} \newcommand\bx{{\mathbf{x}}} \newcommand\by{{\mathbf{y}}} \newcommand\bb{{\mathbf{b}}} \newcommand\bT{{\mathbf{T}}} \newcommand\bP{{\mathbf{P}}} \newcommand\bG{{\mathbf{G}}} \newcommand\bY{{\mathbf{Y}}} \newcommand\hY{\hat {\mathbf{Y}}} \newcommand\bH{{\mathbf{H}}} \newcommand\bD{{\mathbf{D}}} \newcommand\be{{\mathbf{e}}} \newcommand\br{{\mathbf{r}}} \newcommand\bSig{{\mathbf{\Sigma}}} \newcommand\bPi{{\mathbf{\Pi}}} \newcommand\hphi{{\hat\phi}} \)

Sketching for Matrix Computations:
Low-rank Approximation

Ken Clarkson
IBM Research, Almaden

These slides at
http://kenclarkson.org/G2S3_2015/
Enable javascript, navigate as in powerpoint
Math may take some time to render properly; if it doesn't end up looking right, try reloading

Low-rank approximation: bi-criteria

Let

`\bA\in\R^{n\times d}`
integer `k\ge 1`, `\eps\in (0,1)`
`\Delta_k\equiv \min_{\rank(\bY)=k} \norm{\bA-\bY}_F`

There are sketching matrices

`\bS\in\R^{m\times n}`, `\hS\in\R^{d\times m'}` with
`m,m'=\poly(k/\eps)`

such that with constant probability,
$$\norm{\bA\hS(\bS \bA \hS)^+\bS\bA - \bA}_F\le (1+\eps)\Delta_k,$$ computable in `O(\nnz{A}) + \poly(k/\eps)` time.

Rank `m=O(k(\eps^{-1}+k^\gamma))`, can be smaller via other embeddings
Analysis via extension of regression results, to follow
A "bi-criteria" result: both rank and error are larger than their targets
One pass over the data

Low-rank approximation: rank `k`

More work to get rank-`k` with the same sketches:

Let

`\bA\in\R^{n\times d}`
integer `k\ge 1`, `\eps\in (0,1)`
`\Delta_k\equiv \min_{\rank(\bY)=k} \norm{\bA-\bY}_F`

There are sketching matrices

`\bS\in\R^{m\times n}`, `\hS\in\R^{d\times m'}` with
`m,m'=\poly(k/\eps)`

such that with constant probability,
$$\tY \equiv \min_{\rank(\bY)=k}\norm{\bY\bS\bA\hS - \bA\hS}_F$$ has $$\norm{\tY\bS\bA - \bA}_F \le (1+\eps)\Delta_k.$$ Here `\tY` is computable in `O(\nnz{A}) + n\cdot\poly(k/\eps)` time.

Without the rank constraint, opt `\bY` is `\bA\hS(\bS \bA \hS)^+`
`n` in running time can be `\min\{n,d\}`

Low-rank approximation

We reduce to easier problems, up to `O(\eps)` relative error:

$$ \begin{align*} \min_{\rank(\bY_1)=k} & \norm{\bY_1 - \bA}_F & & \text{ } \\ & \longrightarrow \min_{\rank(\bY_2)=k} \norm{\bY_2\bS\bA - \bA}_F & & \text{"}\bS\bA \text{ is good"} \\ & \longrightarrow \min_{\rank(\bY_3)=k} \norm{\bY_3\bS\bA\hS - \bA\hS}_F & & \text{sketching on the right} \\ & \longrightarrow \min_{\rank(\bY_4)=k} \norm{\bY_4 - \bA\hS\bV}_F & & \text{best rank-}k\text{ of proj.} \end{align*} $$

The `\bV` in the last problem is from `\bS\bA\hS=\bU\bSig\bV^\top`
Use: `\bY^*_2\bS\bA` as `\bY^*_1`, `\bY^*_3` as `\bY^*_2`, `\bY^*_4\bSig^{-1} \bU^\top` as `\bY^*_3`
TBD: why these reductions work
- The first two involve sketches,
  the last is "just linear algebra"

Low-rank approximation: `\bS\bA` is good

Sketchspace containing a rank-`k` approx If

`\bA\in\R^{n\times d}`
integer `k\ge 1`, `\eps\in (0,1)`
`\bA_k \equiv\argmin_{\rank(\bY)=k} \norm{\bY - \bA}_F = \bU_k\bSig_k\bV_k`
`\bS ` with `\Err_\bS(\bW ,\bW ')\le\frac23\sqrt{\eps/k}`, for fixed `\bW ,\bW '`
`\bS ` a `(1/9)`-embedding for `\bU_k`

Then $$\min_{\rank(\bY)=k}\norm{\bY\bS\bA - \bA}_F\le (1+\eps)\Delta_k,$$ where `\Delta_k\equiv \norm{\bA-\bA_k}_F`.

Claim: `\tY = \bU_k(\bS\bU_k)^+` has `\tY\bS\bA` close enough to `\bA`

Low-rank approximation: `\tY` is good

It's enough to show that
`\tA= \tY\bS\bA`, where `\tY = \bU_k(\bS\bU_k)^+`, satisfies $$\norm{\bA -\tA}_F\le (1+\eps)\norm{\bA -\bA_k}_F.$$

Another view of `\bA_k`: for
$$\bX _* = \argmin_{\bX\in\R^{k\times d}}\norm{\bU_k\bX -\bA }_F = \class{tied_highlightee1}{\bU_k^\top\bA},$$ `\bA_k = \bU_k\bX _* = \bU_k (\bU_k^\top\bA )`.

Sketched: $$\tX = \argmin_{\bX\in\R^{k\times d}}\norm{\bS (\bU_k\bX -\bA )}_F = \class{tied_highlightee1}{(\bS\bU_k)^+\bS\bA}.$$

Using the multiple-response regression result: $$ \begin{align*} \norm{\tA-\bA }_F & = \norm{\bU_k\tX -\bA }_F\le (1+\eps)\norm{\bU_k\bX _* -\bA }_F \\ & = (1+\eps)\norm{\bA_k-\bA }_F, \end{align*}$$ as claimed.
Here `\tY\bS\bA = [\bU_k(\bS\bU_k)^+]\bS\bA = \bU_k[(\bS\bU_k)^+\bS\bA]= \bU_k\tX`

Low-rank approximation:
sketching on the right

So it is enough to find (as above) \begin{equation} \argmin_{\rank(\bY_2 )=k}\norm{\bY_2\bS\bA -\bA }_F, \label{eq:foo2} \end{equation} but this is still a big problem.

There is sketching matrix `\hS`,
with the affine embedding property, such that $$\argmin_{\rank(\bY_3 )=k}\norm{\bY_3\bS\bA\hS -\bA\hS}_F.$$ is an `\eps`-approx solution to `\eqref{eq:foo2}`.

(As above, related to the bi-criteria result: `\hS` such that
$$\bY = \bA\hS(\bS\bA\hS)^+\eps\text{-approx solves }\min_\bY\norm{\bY\bS\bA -\bA }_F,$$ so $$\bA\hS(\bS\bA\hS)^+\bS\bA $$ is rank-`m`, `m= \poly(k/\eps)`, and as good as the best rank-`k` matrix.)

Affine embeddings

We will transpose to align with other definitions.

Given `\bA\in\R^{n\times d},\bB\in\R^{n\times d'}`, a matrix `\bT ` is an affine embedding for `\bA ,\bB ` if $$\norm{\bT (\bA\bX -\bB }_F^2\approx_\eps\norm{\bA\bX -\bB }_F^2$$ for all `\bX\in\R^{d\times d'}`.

Like a subspace embedding, but "translated by `\bB `".
We have:

`\bA\in\R^{n\times d}` with `\rank(\bA )=r`, `\bB\in\R^{n\times d'}`
`\bT ` with `\Err_\bS(\bW ,\bW ')\le K_1\eps/\sqrt{r}`, for fixed `\bW ,\bW '`, constant `K_1`
`\bT ` a subspace `K_2\eps`-embedding for `\bA `, constant `K_2`

Then
`\bT ` is an affine embedding for `\bA ,\bB `,
when `K_1` and `K_2` are small enough.

Applying affine embeddings on the right

To apply the affine embedding result as needed for low-rank approx, map:

`\bA` in the affine embedding theorem to `(\bS\bA)^\top`
`\bB` to `\bA^\top`
`\bX` to `\bY_2^\top`
`\bT` to `\hS^\top`
`r` to `\rank(\bS\bA) = \poly(k/\eps)\quad` (so `m'=\poly(k/\eps)`)

Affine embeddings:

Also allow sketched multiple-response regression (but are overkill)
Satisfy stronger conditions: approximation to `\norm{\bA\bX-\bB}_F` for all `\bX`
Require stronger conditions

Finding the closest rank-`k`

It remains to solve (as above) $$\argmin_{\rank(\bY_3 )=k}\norm{\bY_3\bS\bA\hS -\bA\hS}_F.$$ Here `\bS\bA\hS\in\R^{m\times m'}` where `m,m'=\poly(k/\eps)`.

For `\bS\bA\hS=\bU\bSig\bV^\top`, problem is $$ \begin{align*} \argmin_{\rank(\bY_3 )=k} & \norm{\bY_3\bU\bSig\bV^\top -\bA\hS}_F, \text{ or} \\ \argmin_{\rank(\bY_4 )=k} & \norm{\bY_4\bV^\top -\bA\hS}_F, \end{align*} $$ where `\bY_3 = \bY_4\bSig^{-1}\bU^\top`.

Since `\bV` is orthonormal, $$\norm{\bY_4\bV^\top -\bA\hS}_F = \norm{(\bY_4\bV^\top -\bA\hS)\bV}_F = \norm{\bY_4 - \bA\hS\bV}_F,$$ where `\bA\hS\bV` can be computed in `n\cdot\poly(k/\eps)`,
yielding rank-`k` approximation problem on `\poly(k/\eps)\times \poly(k/\eps)` matrices.

Sketching for Matrix Computations: Low-rank Approximation

Ken Clarkson IBM Research, Almaden