\(
\newcommand\norm[1]{|| #1 ||}
\newcommand{\myvertiii}[1]{{\vert\kern-0.2ex\vert\kern-0.2ex\vert #1 \vert \kern-0.2ex \vert\kern-0.2ex\vert}}
\newcommand{\norme}[1]{{\myvertiii{#1}}}
\newcommand\nnz[1]{{\textrm{nnz}(#1)}}
\newcommand\E{\mathbf{E}}
\newcommand\Dist{\mathbf{Dist}}
\newcommand\reals{{\mathrm{I\!R}}}
\newcommand\R{{\mathrm{I\!R}}}
\newcommand\poly{{\textrm{poly}}}
\newcommand\rank{{\textrm{rank}}}
\newcommand\colspan{{\textrm{range}}}
\newcommand\diam{{\textrm{diam}}}
\newcommand\polylog{{\textrm{polylog}}}
\newcommand\rowspan{{\textrm{rowspan}}}
\newcommand\colbasis{{\textrm{colbasis}}}
\newcommand\argmin{{\mathrm{argmin}}}
\newcommand\Err{{\mathtt{NE}}}
\newcommand\eps{{\epsilon}}
\newcommand\bfone{{\mathbf{1}}}
\newcommand\Var{{\mathbf{Var}}}
\newcommand\stdev[1]{{\mathbf{stdev(#1)}}}
\newcommand\cgf{{\mathbf{CGF}}}
\newcommand\bfone{{\mathbb{1}}}
\newcommand\Iden{{\mathbf{I}}}
\newcommand\Prob{{\mathbf{Pr}}}
\DeclareMathOperator\subg{\mathbf{subG}}
\newcommand\cN{{\cal N}}
\newcommand\cS{{\cal S}}
\newcommand\cD{{\cal D}}
\newcommand\cM{{\cal M}}
\newcommand\cE{{\cal E}}
\newcommand\cC{{\cal C}}
\newcommand\KJL{K_{\mathbb{JL}}}
\newcommand\tO{{\tilde O}}
\newcommand\tx{{\tilde{\mathbf{x}}}}
\newcommand\tX{{\tilde{\mathbf{X}}}}
\newcommand\tY{{\tilde{\mathbf{Y}}}}
\newcommand\tA{{\tilde{\mathbf{A}}}}
\newcommand\bA{{\mathbf{A}}}
\newcommand\ba{{\mathbf{a}}}
\newcommand\bw{{\mathbf{w}}}
\newcommand\bW{{\mathbf{W}}}
\newcommand\bR{{\mathbf{R}}}
\newcommand\bB{{\mathbf{B}}}
\newcommand\bQ{{\mathbf{Q}}}
\newcommand\bZ{{\mathbf{Z}}}
\newcommand\bz{{\mathbf{z}}}
\newcommand\bU{{\mathbf{U}}}
\newcommand\bV{{\mathbf{V}}}
\newcommand\bX{{\mathbf{X}}}
\newcommand\bS{{\mathbf{S}}}
\newcommand\hS{\hat {\mathbf{S}}}
\newcommand\bs{{\mathbf{s}}}
\newcommand\bx{{\mathbf{x}}}
\newcommand\by{{\mathbf{y}}}
\newcommand\bb{{\mathbf{b}}}
\newcommand\bT{{\mathbf{T}}}
\newcommand\bP{{\mathbf{P}}}
\newcommand\bG{{\mathbf{G}}}
\newcommand\bY{{\mathbf{Y}}}
\newcommand\hY{\hat {\mathbf{Y}}}
\newcommand\bH{{\mathbf{H}}}
\newcommand\bD{{\mathbf{D}}}
\newcommand\be{{\mathbf{e}}}
\newcommand\br{{\mathbf{r}}}
\newcommand\bSig{{\mathbf{\Sigma}}}
\newcommand\bPi{{\mathbf{\Pi}}}
\newcommand\hphi{{\hat\phi}}
\)
Sketching for Matrix Computations:
Least-squares and Multiple-Response Regression
Ken Clarkson
IBM Research, Almaden
- These slides at
http://kenclarkson.org/G2S3_2015/
- Enable javascript, navigate as in powerpoint
- Math may take some time to render properly; if it doesn't end up looking right, try reloading
Least-squares regression, again:
weaker conditions
Before:
`\bS ` a subspace `\eps`-embedding for `[\bA\ b]\implies`
`\argmin_{\bx\in\R^d}\norm{\bS (\bA\bx -\bb)}` `\eps`-approx for `\min_{\bx\in\R^d}\norm{\bA\bx -\bb}`
However, weaker conditions are enough:
If
- `\bA\in\R^{n\times d}` with `\rank(\bA )=r`, `\bb\in\R^n`
- `\bS ` with `\Err_\bS(\bW ,\bW ')\le\frac23\sqrt{\eps/r}`, for fixed `\bW ,\bW '`
- `\bS ` a subspace `(1/9)`-embedding for `\bA `
Then for
$$
\begin{align*}
\tx & \equiv\argmin_{\bx\in\R^d}\norm{\bS (\bA\bx -\bb)}
\\\bx_* & \equiv\argmin_{\bx\in\R^d}\norm{\bA\bx -\bb},
\end{align*}
$$
we have
$$\norm{\bA\tx -\bb}\le (1+\eps)\Delta_*,\text{ where }\Delta_*\equiv\norm{\bA\bx_*-\bb}$$
Least-squares, again:
smaller sketches
Note that with constant failure probability
the conditions on `\bS ` hold if for any fixed `\by`,
$$\norme{\Err(\by )}_{2,\bS }\le K\min\{\sqrt{\eps/r}, 1/r\},$$
(that is, the standard deviation of the norm of the error of estimating the norm of `\by` is bounded appropriately),
so that a sparse embedding to `O(r/\eps + r^2)` dimensions would suffice.
Least squares proof, 1/2
- We can asssume WLOG `\bA ` has orthonormal columns
- `\bS ` is a `(1/9)`-embedding
$$\implies\norm{\bA^\top\bS^\top\bS\bA -\Iden}_2\le 1/3
\implies \sigma_\min(\bA^\top\bS^\top\bS\bA) \ge 2/3$$
- The normal equations:
$$\bA^\top (\bA\bx^* -\bb) = 0 = \bA^\top\bS^\top\bS (\bA\tx -\bb)$$
- Using the first normal eq. and by Pythagoras,
$$\begin{align*}
\norm{\bA\tx-\bb}^2 & = \norm{\bA\tx -\bA\bx^*}^2 +\norm{\bA\bx^*-\bb}^2\\ & = \norm{\tx -\bx^*}^2 +\Delta_*^2
\end{align*}$$
- By (1), `\norm{\bA\by }= \norm{\by }`
So it is enough show that
$$\norm{\tx -\bx^*}\le\sqrt{\eps}\Delta_*.$$
(Note: this bound is for `\tx,\bx^*` for `\bA` assumed orthonormal, that is, where an orthonormal basis is used for the analysis.)
Least squares proof, 2/2
$$\begin{alignat*}{2}
(2/3) & \norm{\tx -\bx^*} & &
\\ & \le \norm{\bA^\top \bS^\top\bS\bA (\tx-\bx_*)} & & (\text{since }\sigma_\min(\bA^\top\bS^\top\bS\bA )\ge 2/3 )
\\ & = \norm{\bA^\top\bS^\top\bS (\bA\tx -\bA\bx_*)
\\ & \qquad\qquad -\bA^\top\bS^\top\bS (\bA\tx -\bb) } & & (\text{normal eq.} )
\\ & = \norm{\bA^\top\bS^\top\bS (\bb -\bA\bx_*) } & &
\\ & = \Err(\bA^\top ,\bb-\bA\bx_*)\norm{\bA }_F\norm{\bb-\bA\bx_*} \quad & & (\text{ other normal eq.} )
\\ & \le (2/3)\sqrt{\eps/r}\norm{\bA }_F\norm{\bb-\bA\bx^*} & & (\text{matmult bound} )
\\ & = (2/3)\sqrt{\eps}\Delta_*, & & (\rank(\bA )=r,\bA^\top\bA = \Iden )
\end{alignat*}$$
So
$$\norm{\tx -\bx^*}\le \sqrt{\eps}\Delta_*,$$
as claimed.
Multiple-response regression
Let
- `\bA\in\R^{n\times d}`
- `\bB\in\R^{n\times d'}`
The
multiple-response, or
generalized, regression problem is to find
$$\argmin_{\bX\in\R^{d\times d'}}\norm{\bA\bX -\bB}_F$$
Since
$$\norm{\bA\bX -\bB}_F^2 = \sum_{j\in[d']}\norm{\bA\bX _{*j} = \bB_{*j}}^2,$$
no harder than `d'` independent least-squares problems.
It can be easier: for example, sketch `\bA ` once.
Multiple-response regression
Nearly the same proof as for 1-response regression shows:
If
- `\bA\in\R^{n\times d}` with `\rank(\bA )=r`, `\bB\in\R^{n\times d'}`
- `\bS ` with `\Err_\bS(\bW ,\bW ')\le\frac23\sqrt{\eps/r}`, for fixed `\bW ,\bW '`
- `\bS ` a subspace `(1/9)`-embedding for `\bA `
Then for
$$
\begin{align*}
\tX & \equiv\argmin_{\bX\in\R^{d\times d'}}\norm{\bS (\bA\bX -\bB)}_F
\\\bX_* & \equiv\argmin_{\bX\in\R^{d\times d'}}\norm{\bA\bX - \bB}_F,
\end{align*}
$$
we have
$$\norm{\bA\tX-\bB}_F\le (1+\eps)\Delta_*,\text{ where }\Delta_*\equiv\norm{\bA\bX _*-\bB}_F$$
- Same conditions on sketch as for 1-response
- `\norm{\text{vector}}\rightarrow\norm{\text{matrix}}_F`
- Pythagoras `\rightarrow` Matrix Pythagoras
- Subspace embedding version does not generalize