Least-squares and Multiple-Response Regression, G2S3 2015, Sketching Matrices

\( \newcommand\norm[1]{|| #1 ||} \newcommand{\myvertiii}[1]{{\vert\kern-0.2ex\vert\kern-0.2ex\vert #1 \vert \kern-0.2ex \vert\kern-0.2ex\vert}} \newcommand{\norme}[1]{{\myvertiii{#1}}} \newcommand\nnz[1]{{\textrm{nnz}(#1)}} \newcommand\E{\mathbf{E}} \newcommand\Dist{\mathbf{Dist}} \newcommand\reals{{\mathrm{I\!R}}} \newcommand\R{{\mathrm{I\!R}}} \newcommand\poly{{\textrm{poly}}} \newcommand\rank{{\textrm{rank}}} \newcommand\colspan{{\textrm{range}}} \newcommand\diam{{\textrm{diam}}} \newcommand\polylog{{\textrm{polylog}}} \newcommand\rowspan{{\textrm{rowspan}}} \newcommand\colbasis{{\textrm{colbasis}}} \newcommand\argmin{{\mathrm{argmin}}} \newcommand\Err{{\mathtt{NE}}} \newcommand\eps{{\epsilon}} \newcommand\bfone{{\mathbf{1}}} \newcommand\Var{{\mathbf{Var}}} \newcommand\stdev[1]{{\mathbf{stdev(#1)}}} \newcommand\cgf{{\mathbf{CGF}}} \newcommand\bfone{{\mathbb{1}}} \newcommand\Iden{{\mathbf{I}}} \newcommand\Prob{{\mathbf{Pr}}} \DeclareMathOperator\subg{\mathbf{subG}} \newcommand\cN{{\cal N}} \newcommand\cS{{\cal S}} \newcommand\cD{{\cal D}} \newcommand\cM{{\cal M}} \newcommand\cE{{\cal E}} \newcommand\cC{{\cal C}} \newcommand\KJL{K_{\mathbb{JL}}} \newcommand\tO{{\tilde O}} \newcommand\tx{{\tilde{\mathbf{x}}}} \newcommand\tX{{\tilde{\mathbf{X}}}} \newcommand\tY{{\tilde{\mathbf{Y}}}} \newcommand\tA{{\tilde{\mathbf{A}}}} \newcommand\bA{{\mathbf{A}}} \newcommand\ba{{\mathbf{a}}} \newcommand\bw{{\mathbf{w}}} \newcommand\bW{{\mathbf{W}}} \newcommand\bR{{\mathbf{R}}} \newcommand\bB{{\mathbf{B}}} \newcommand\bQ{{\mathbf{Q}}} \newcommand\bZ{{\mathbf{Z}}} \newcommand\bz{{\mathbf{z}}} \newcommand\bU{{\mathbf{U}}} \newcommand\bV{{\mathbf{V}}} \newcommand\bX{{\mathbf{X}}} \newcommand\bS{{\mathbf{S}}} \newcommand\hS{\hat {\mathbf{S}}} \newcommand\bs{{\mathbf{s}}} \newcommand\bx{{\mathbf{x}}} \newcommand\by{{\mathbf{y}}} \newcommand\bb{{\mathbf{b}}} \newcommand\bT{{\mathbf{T}}} \newcommand\bP{{\mathbf{P}}} \newcommand\bG{{\mathbf{G}}} \newcommand\bY{{\mathbf{Y}}} \newcommand\hY{\hat {\mathbf{Y}}} \newcommand\bH{{\mathbf{H}}} \newcommand\bD{{\mathbf{D}}} \newcommand\be{{\mathbf{e}}} \newcommand\br{{\mathbf{r}}} \newcommand\bSig{{\mathbf{\Sigma}}} \newcommand\bPi{{\mathbf{\Pi}}} \newcommand\hphi{{\hat\phi}} \)

Sketching for Matrix Computations:
Least-squares and Multiple-Response Regression

Ken Clarkson
IBM Research, Almaden

These slides at
http://kenclarkson.org/G2S3_2015/
Enable javascript, navigate as in powerpoint
Math may take some time to render properly; if it doesn't end up looking right, try reloading

Least-squares regression, again:
weaker conditions

Before:
`\bS ` a subspace `\eps`-embedding for `[\bA\ b]\implies`
`\argmin_{\bx\in\R^d}\norm{\bS (\bA\bx -\bb)}` `\eps`-approx for `\min_{\bx\in\R^d}\norm{\bA\bx -\bb}`

However, weaker conditions are enough:

`\bA\in\R^{n\times d}` with `\rank(\bA )=r`, `\bb\in\R^n`
`\bS ` with `\Err_\bS(\bW ,\bW ')\le\frac23\sqrt{\eps/r}`, for fixed `\bW ,\bW '`
`\bS ` a subspace `(1/9)`-embedding for `\bA `

Then for $$ \begin{align*} \tx & \equiv\argmin_{\bx\in\R^d}\norm{\bS (\bA\bx -\bb)} \\\bx_* & \equiv\argmin_{\bx\in\R^d}\norm{\bA\bx -\bb}, \end{align*} $$ we have $$\norm{\bA\tx -\bb}\le (1+\eps)\Delta_*,\text{ where }\Delta_*\equiv\norm{\bA\bx_*-\bb}$$

Least-squares, again:
smaller sketches

Note that with constant failure probability the conditions on `\bS ` hold if for any fixed `\by`, $$\norme{\Err(\by )}_{2,\bS }\le K\min\{\sqrt{\eps/r}, 1/r\},$$ (that is, the standard deviation of the norm of the error of estimating the norm of `\by` is bounded appropriately),
so that a sparse embedding to `O(r/\eps + r^2)` dimensions would suffice.

Least squares proof, 1/2

We can asssume WLOG `\bA ` has orthonormal columns
`\bS ` is a `(1/9)`-embedding $$\implies\norm{\bA^\top\bS^\top\bS\bA -\Iden}_2\le 1/3 \implies \sigma_\min(\bA^\top\bS^\top\bS\bA) \ge 2/3$$
The normal equations: $$\bA^\top (\bA\bx^* -\bb) = 0 = \bA^\top\bS^\top\bS (\bA\tx -\bb)$$
Using the first normal eq. and by Pythagoras, $$\begin{align*} \norm{\bA\tx-\bb}^2 & = \norm{\bA\tx -\bA\bx^*}^2 +\norm{\bA\bx^*-\bb}^2\\ & = \norm{\tx -\bx^*}^2 +\Delta_*^2 \end{align*}$$
- By (1), `\norm{\bA\by }= \norm{\by }`

So it is enough show that $$\norm{\tx -\bx^*}\le\sqrt{\eps}\Delta_*.$$ (Note: this bound is for `\tx,\bx^*` for `\bA` assumed orthonormal, that is, where an orthonormal basis is used for the analysis.)

Least squares proof, 2/2

$$\begin{alignat*}{2} (2/3) & \norm{\tx -\bx^*} & & \\ & \le \norm{\bA^\top \bS^\top\bS\bA (\tx-\bx_*)} & & (\text{since }\sigma_\min(\bA^\top\bS^\top\bS\bA )\ge 2/3 ) \\ & = \norm{\bA^\top\bS^\top\bS (\bA\tx -\bA\bx_*) \\ & \qquad\qquad -\bA^\top\bS^\top\bS (\bA\tx -\bb) } & & (\text{normal eq.} ) \\ & = \norm{\bA^\top\bS^\top\bS (\bb -\bA\bx_*) } & & \\ & = \Err(\bA^\top ,\bb-\bA\bx_*)\norm{\bA }_F\norm{\bb-\bA\bx_*} \quad & & (\text{ other normal eq.} ) \\ & \le (2/3)\sqrt{\eps/r}\norm{\bA }_F\norm{\bb-\bA\bx^*} & & (\text{matmult bound} ) \\ & = (2/3)\sqrt{\eps}\Delta_*, & & (\rank(\bA )=r,\bA^\top\bA = \Iden ) \end{alignat*}$$

So $$\norm{\tx -\bx^*}\le \sqrt{\eps}\Delta_*,$$ as claimed.

Multiple-response regression

Let

`\bA\in\R^{n\times d}`
`\bB\in\R^{n\times d'}`

The multiple-response, or generalized, regression problem is to find $$\argmin_{\bX\in\R^{d\times d'}}\norm{\bA\bX -\bB}_F$$

Since $$\norm{\bA\bX -\bB}_F^2 = \sum_{j\in[d']}\norm{\bA\bX _{*j} = \bB_{*j}}^2,$$ no harder than `d'` independent least-squares problems.

It can be easier: for example, sketch `\bA ` once.

Multiple-response regression

Nearly the same proof as for 1-response regression shows:

`\bA\in\R^{n\times d}` with `\rank(\bA )=r`, `\bB\in\R^{n\times d'}`
`\bS ` with `\Err_\bS(\bW ,\bW ')\le\frac23\sqrt{\eps/r}`, for fixed `\bW ,\bW '`
`\bS ` a subspace `(1/9)`-embedding for `\bA `

Then for $$ \begin{align*} \tX & \equiv\argmin_{\bX\in\R^{d\times d'}}\norm{\bS (\bA\bX -\bB)}_F \\\bX_* & \equiv\argmin_{\bX\in\R^{d\times d'}}\norm{\bA\bX - \bB}_F, \end{align*} $$ we have $$\norm{\bA\tX-\bB}_F\le (1+\eps)\Delta_*,\text{ where }\Delta_*\equiv\norm{\bA\bX _*-\bB}_F$$

Same conditions on sketch as for 1-response
`\norm{\text{vector}}\rightarrow\norm{\text{matrix}}_F`
Pythagoras `\rightarrow` Matrix Pythagoras
Subspace embedding version does not generalize

Sketching for Matrix Computations: Least-squares and Multiple-Response Regression

Ken Clarkson IBM Research, Almaden

Least-squares regression, again:weaker conditions

Least-squares, again: smaller sketches

Least squares proof, 1/2

Least squares proof, 2/2

Multiple-response regression

Multiple-response regression

Sketching for Matrix Computations:
Least-squares and Multiple-Response Regression

Ken Clarkson
IBM Research, Almaden

Least-squares regression, again:
weaker conditions

Least-squares, again:
smaller sketches