\newcommand\norm[1]{|| #1 ||}
\newcommand\nnz[1]{{\textrm{nnz}(#1)}}
\newcommand\E{\mathbf{E}}
\newcommand\Dist{\mathbf{Dist}}
\newcommand\reals{{\mathrm{I\!R}}}
\newcommand\R{{\mathrm{I\!R}}}
\newcommand\poly{{\textrm{poly}}}
\newcommand\rank{{\textrm{rank}}}
\newcommand\colspan{{\textrm{range}}}
\newcommand\diam{{\textrm{diam}}}
\newcommand\polylog{{\textrm{polylog}}}
\newcommand\rowspan{{\textrm{rowspan}}}
\newcommand\colbasis{{\textrm{colbasis}}}
\newcommand\argmin{{\mathrm{argmin}}}
\newcommand\Err{{\mathtt{NE}}}
\newcommand\eps{{\epsilon}}
\newcommand\bfone{{\mathbf{1}}}
\newcommand\Var{{\mathbf{Var}}}
\newcommand\stdev[1]{{\mathbf{stdev(#1)}}}
\newcommand\cgf{{\mathbf{CGF}}}
\newcommand\bfone{{\mathbb{1}}}
\newcommand\Iden{{\mathbf{I}}}
\newcommand\Prob{{\mathbf{Pr}}}
\DeclareMathOperator\subg{\mathbf{subG}}
\newcommand\cN{{\cal N}}
\newcommand\cS{{\cal S}}
\newcommand\cD{{\cal D}}
\newcommand\cM{{\cal M}}
\newcommand\cE{{\cal E}}
\newcommand\cC{{\cal C}}
\newcommand\KJL{K_{\mathbb{JL}}}
\newcommand\tO{{\tilde O}}
\newcommand\tx{{\tilde{\mathbf{x}}}}
\newcommand\tX{{\tilde{\mathbf{X}}}}
\newcommand\tY{{\tilde{\mathbf{Y}}}}
\newcommand\tA{{\tilde{\mathbf{A}}}}
\newcommand\bA{{\mathbf{A}}}
\newcommand\ba{{\mathbf{a}}}
\newcommand\bw{{\mathbf{w}}}
\newcommand\bW{{\mathbf{W}}}
\newcommand\bR{{\mathbf{R}}}
\newcommand\bB{{\mathbf{B}}}
\newcommand\bQ{{\mathbf{Q}}}
\newcommand\bZ{{\mathbf{Z}}}
\newcommand\bz{{\mathbf{z}}}
\newcommand\bU{{\mathbf{U}}}
\newcommand\bV{{\mathbf{V}}}
\newcommand\bX{{\mathbf{X}}}
\newcommand\bS{{\mathbf{S}}}
\newcommand\hS{\hat {\mathbf{S}}}
\newcommand\bs{{\mathbf{s}}}
\newcommand\bx{{\mathbf{x}}}
\newcommand\by{{\mathbf{y}}}
\newcommand\bb{{\mathbf{b}}}
\newcommand\bT{{\mathbf{T}}}
\newcommand\bP{{\mathbf{P}}}
\newcommand\bG{{\mathbf{G}}}
\newcommand\bY{{\mathbf{Y}}}
\newcommand\hY{\hat {\mathbf{Y}}}
\newcommand\bH{{\mathbf{H}}}
\newcommand\bD{{\mathbf{D}}}
\newcommand\be{{\mathbf{e}}}
\newcommand\br{{\mathbf{r}}}
\newcommand\bSig{{\mathbf{\Sigma}}}
\newcommand\bPi{{\mathbf{\Pi}}}
\newcommand\hphi{{\hat\phi}}
Sketching for Matrix Computations:
Embeddings and Regression
Ken Clarkson
IBM Research, Almaden
- These slides at
http://kenclarkson.org/G2S3_2015/
- Enable javascript, navigate as in powerpoint
- Math may take some time to render properly; if it doesn't end up looking right, try reloading
Sketching Matrices:
Oblivious Subspace Embeddings
Matrix \bS\in\R^{m\times n} is an \eps-embedding of set P\subset\R^n
if
\norm{\bS\by }\approx_\eps\norm{\by }\text{ for all }\by\in P.
\bS will be called a "sketching matrix".
a\approx_\eps b\equiv | a - b |\le\eps |b|
For \bA\in\R^{n\times d},
a matrix \bS is a subspace \eps-embedding
for \bA if \bS is an \eps-embedding for \colspan(\bA )= \{\bA\bx\mid\bx\in\R^d\}. That is,
\norm{\bS\bA\bx }\approx_\eps\norm{\bA\bx }\text{ for all }\bx\in\R^d.
\bS\bA will be called a "sketch".
An
oblivious subspace embedding is:
- A probability distribution \cal D over matrices \bS\in\R^{m\times n}, so that
- For any unknown but fixed matrix \bA,
\bS is a subspace \eps-embedding for \bA , with high-enough probability.
Least-squares regression
- Input is n points in d+1 dimensions
- Points have the form (\bA_{i*},\bb_i),
where:
- \bA is an n\times d matrix
- \bA_{i*}\in\reals^d is row i of a matrix \bA
- \bb\in\reals^n is a column vector
-
Regression [G95][L05]:
find a good \bx\in\reals^d
so that \bA_i\bx is a good prediction for \bb_i
-
"Good"=
\min_{\bx\in\reals^d}\sum_i (\bA_i\bx -\bb_i)^2 = \min_{\bx\in\reals^d}\norm{\bA\bx -\bb}_2^2
- \br\equiv\bA\bx -\bb is the residual
- \norm{\br}_2^2 = \sum_i\br_i^2, the squared Euclidean length of \br
Subspace embeddings for regression
Suppose \bS is a subspace embedding for \colspan([\bA\ \bb]), so
\norm{\bS [\bA\ \bb]\by }\approx_\eps\norm{[\bA\ \bb]\by }\text{ for }\by\in\R^{d+1}.
Then in particular this is true when \by=\left[\begin{smallmatrix}x\\ -1\end{smallmatrix}\right], x\in\R^d, so
\norm{\bS (\bA\bx -\bb)}\approx_\eps\norm{\bA\bx -\bb} = \norm{\br}.
That is,
if
\begin{align*}
\tx\equiv & \argmin_{\bx\in\R^d}\norm{\bS (\bA\bx -\bb)}
\\ \bx^*\equiv & \argmin_{\bx\in\R^d}\norm{\bA\bx -\bb},\end{align*}
then
\begin{align*}
\norm{\bA\tx-\bb}
& \le\frac{\norm{\bS (\bA\tx-\bb)}}{1-\eps}
\le\frac{\norm{\bS (\bA\bx^*-\bb)}} {1-\eps}
\le\norm{\bA\bx^*-\bb}\frac{1+\eps}{1-\eps},
\end{align*}
and so for \eps\le 1/3, \norm{\bA\tx-\bb}\le\norm{\bA\bx^*-\bb}(1+3\eps).
Solving least-squares in "sketch space"
yields a good solution in the original space.
The advantages of obliviousness
\bS is oblivious: the distribution \cal D
does not depend on the input data.
Algorithmic advantages:
- Construct \bS , then input \bA
- Streaming: when an entry of \bA changes, \bS\bA is easy to update
- Distributed: if \bA = \sum_p\bA^p over processors p:
- Compute \bS\bA^p at each processor
- Ship to master for summation
The advantages of obliviousness: analytic
Analysis advantages: for any condition \cE
\cE\text{ holds for }\bS\bA\bx\text{ for all }\bx\in\R^d\\
\iff\\
\cE\text{ holds for }\bS\by\text{ for all }\by\in\colspan(\bA )\\
\iff\\
\cE\text{ holds for }\bS\bU\mathbf{z}\text{ for all }\mathbf{z}\in\R^{\rank(\bA )},\\
\qquad\qquad\bU\text{ an orthogonal basis of }\colspan(\bA )
So often WLOG,
\bA is full rank and has orthonormal columns, that is, is an orthonormal matrix.
Also, constraints "for free", meaning:
\min_{\bx\in{\cal C}}\norm{\bS\bA\bx }\approx\min_{\bx\in{\cal C}}\norm{\bA\bx },
for any constraint set \cal C, such as \cC=\{x\mid x\ge 0\}.