\(
\newcommand\norm[1]{|| #1 ||}
\newcommand\nnz[1]{{\textrm{nnz}(#1)}}
\newcommand\E{\mathbf{E}}
\newcommand\Dist{\mathbf{Dist}}
\newcommand\reals{{\mathrm{I\!R}}}
\newcommand\R{{\mathrm{I\!R}}}
\newcommand\poly{{\textrm{poly}}}
\newcommand\rank{{\textrm{rank}}}
\newcommand\colspan{{\textrm{range}}}
\newcommand\diam{{\textrm{diam}}}
\newcommand\polylog{{\textrm{polylog}}}
\newcommand\rowspan{{\textrm{rowspan}}}
\newcommand\colbasis{{\textrm{colbasis}}}
\newcommand\argmin{{\mathrm{argmin}}}
\newcommand\Err{{\mathtt{NE}}}
\newcommand\eps{{\epsilon}}
\newcommand\bfone{{\mathbf{1}}}
\newcommand\Var{{\mathbf{Var}}}
\newcommand\stdev[1]{{\mathbf{stdev(#1)}}}
\newcommand\cgf{{\mathbf{CGF}}}
\newcommand\bfone{{\mathbb{1}}}
\newcommand\Iden{{\mathbf{I}}}
\newcommand\Prob{{\mathbf{Pr}}}
\DeclareMathOperator\subg{\mathbf{subG}}
\newcommand\cN{{\cal N}}
\newcommand\cS{{\cal S}}
\newcommand\cD{{\cal D}}
\newcommand\cM{{\cal M}}
\newcommand\cE{{\cal E}}
\newcommand\cC{{\cal C}}
\newcommand\KJL{K_{\mathbb{JL}}}
\newcommand\tO{{\tilde O}}
\newcommand\tx{{\tilde{\mathbf{x}}}}
\newcommand\tX{{\tilde{\mathbf{X}}}}
\newcommand\tY{{\tilde{\mathbf{Y}}}}
\newcommand\tA{{\tilde{\mathbf{A}}}}
\newcommand\bA{{\mathbf{A}}}
\newcommand\ba{{\mathbf{a}}}
\newcommand\bw{{\mathbf{w}}}
\newcommand\bW{{\mathbf{W}}}
\newcommand\bR{{\mathbf{R}}}
\newcommand\bB{{\mathbf{B}}}
\newcommand\bQ{{\mathbf{Q}}}
\newcommand\bZ{{\mathbf{Z}}}
\newcommand\bz{{\mathbf{z}}}
\newcommand\bU{{\mathbf{U}}}
\newcommand\bV{{\mathbf{V}}}
\newcommand\bX{{\mathbf{X}}}
\newcommand\bS{{\mathbf{S}}}
\newcommand\hS{\hat {\mathbf{S}}}
\newcommand\bs{{\mathbf{s}}}
\newcommand\bx{{\mathbf{x}}}
\newcommand\by{{\mathbf{y}}}
\newcommand\bb{{\mathbf{b}}}
\newcommand\bT{{\mathbf{T}}}
\newcommand\bP{{\mathbf{P}}}
\newcommand\bG{{\mathbf{G}}}
\newcommand\bY{{\mathbf{Y}}}
\newcommand\hY{\hat {\mathbf{Y}}}
\newcommand\bH{{\mathbf{H}}}
\newcommand\bD{{\mathbf{D}}}
\newcommand\be{{\mathbf{e}}}
\newcommand\br{{\mathbf{r}}}
\newcommand\bSig{{\mathbf{\Sigma}}}
\newcommand\bPi{{\mathbf{\Pi}}}
\newcommand\hphi{{\hat\phi}}
\)
Sketching for Matrix Computations:
Embeddings and Regression
Ken Clarkson
IBM Research, Almaden
- These slides at
http://kenclarkson.org/G2S3_2015/
- Enable javascript, navigate as in powerpoint
- Math may take some time to render properly; if it doesn't end up looking right, try reloading
Sketching Matrices:
Oblivious Subspace Embeddings
Matrix `\bS\in\R^{m\times n}` is an `\eps`-embedding of set `P\subset\R^n`
if
$$\norm{\bS\by }\approx_\eps\norm{\by }\text{ for all }\by\in P.$$
`\bS` will be called a "sketching matrix".
`a\approx_\eps b\equiv | a - b |\le\eps |b|`
For `\bA\in\R^{n\times d}`,
a matrix `\bS ` is a subspace `\eps`-embedding
for `\bA ` if `\bS ` is an `\eps`-embedding for `\colspan(\bA )= \{\bA\bx\mid\bx\in\R^d\}`. That is,
$$\norm{\bS\bA\bx }\approx_\eps\norm{\bA\bx }\text{ for all }\bx\in\R^d.$$
`\bS\bA` will be called a "sketch".
An
oblivious subspace embedding is:
- A probability distribution `\cal D` over matrices `\bS\in\R^{m\times n}`, so that
- For any unknown but fixed matrix `\bA`,
`\bS ` is a subspace `\eps`-embedding for `\bA `, with high-enough probability.
Least-squares regression
- Input is `n` points in `d+1` dimensions
- Points have the form `(\bA_{i*},\bb_i)`,
where:
- `\bA ` is an `n\times d` matrix
- `\bA_{i*}\in\reals^d` is row `i` of a matrix `\bA `
- `\bb\in\reals^n` is a column vector
-
Regression [G95][L05]:
find a good `\bx\in\reals^d`
so that `\bA_i\bx ` is a good prediction for `\bb_i`
-
"Good"=
$$
\min_{\bx\in\reals^d}\sum_i (\bA_i\bx -\bb_i)^2 = \min_{\bx\in\reals^d}\norm{\bA\bx -\bb}_2^2
$$
- `\br\equiv\bA\bx -\bb` is the residual
- `\norm{\br}_2^2 = \sum_i\br_i^2`, the squared Euclidean length of `\br`
Subspace embeddings for regression
Suppose `\bS` is a subspace embedding for `\colspan([\bA\ \bb])`, so
$$\norm{\bS [\bA\ \bb]\by }\approx_\eps\norm{[\bA\ \bb]\by }\text{ for }\by\in\R^{d+1}.$$
Then in particular this is true when `\by=\left[\begin{smallmatrix}x\\ -1\end{smallmatrix}\right]`, `x\in\R^d`, so
$$\norm{\bS (\bA\bx -\bb)}\approx_\eps\norm{\bA\bx -\bb} = \norm{\br}.$$
That is,
if
$$\begin{align*}
\tx\equiv & \argmin_{\bx\in\R^d}\norm{\bS (\bA\bx -\bb)}
\\ \bx^*\equiv & \argmin_{\bx\in\R^d}\norm{\bA\bx -\bb},\end{align*}
$$
then
$$
\begin{align*}
\norm{\bA\tx-\bb}
& \le\frac{\norm{\bS (\bA\tx-\bb)}}{1-\eps}
\le\frac{\norm{\bS (\bA\bx^*-\bb)}} {1-\eps}
\le\norm{\bA\bx^*-\bb}\frac{1+\eps}{1-\eps},
\end{align*}$$
and so for `\eps\le 1/3`, `\norm{\bA\tx-\bb}\le\norm{\bA\bx^*-\bb}(1+3\eps)`.
Solving least-squares in "sketch space"
yields a good solution in the original space.
The advantages of obliviousness
`\bS ` is oblivious: the distribution `\cal D`
does not depend on the input data.
Algorithmic advantages:
- Construct `\bS `, then input `\bA `
- Streaming: when an entry of `\bA ` changes, `\bS\bA ` is easy to update
- Distributed: if `\bA = \sum_p\bA^p` over processors `p`:
- Compute `\bS\bA^p` at each processor
- Ship to master for summation
The advantages of obliviousness: analytic
Analysis advantages: for any condition `\cE`
$$
\cE\text{ holds for }\bS\bA\bx\text{ for all }\bx\in\R^d\\
\iff\\
\cE\text{ holds for }\bS\by\text{ for all }\by\in\colspan(\bA )\\
\iff\\
\cE\text{ holds for }\bS\bU\mathbf{z}\text{ for all }\mathbf{z}\in\R^{\rank(\bA )},\\
\qquad\qquad\bU\text{ an orthogonal basis of }\colspan(\bA )
$$
So often WLOG,
`\bA ` is full rank and has orthonormal columns, that is, is an orthonormal matrix.
Also, constraints "for free", meaning:
$$\min_{\bx\in{\cal C}}\norm{\bS\bA\bx }\approx\min_{\bx\in{\cal C}}\norm{\bA\bx },$$
for any constraint set `\cal C`, such as `\cC=\{x\mid x\ge 0\}`.