\(
\newcommand\norm[1]{|| #1 ||}
\newcommand{\myvertiii}[1]{{\vert\kern-0.2ex\vert\kern-0.2ex\vert #1 \vert\kern-0.2ex\vert\kern-0.2ex\vert}}
\newcommand{\norme}[1]{{\myvertiii{#1}}}
\newcommand\nnz[1]{{\textrm{nnz}(#1)}}
\newcommand\E{\mathbf{E}}
\newcommand\Dist{\mathbf{Dist}}
\newcommand\reals{{\mathrm{I\!R}}}
\newcommand\R{{\mathrm{I\!R}}}
\newcommand\poly{{\textrm{poly}}}
\newcommand\rank{{\textrm{rank}}}
\newcommand\colspan{{\textrm{range}}}
\newcommand\diam{{\textrm{diam}}}
\newcommand\polylog{{\textrm{polylog}}}
\newcommand\rowspan{{\textrm{rowspan}}}
\newcommand\colbasis{{\textrm{colbasis}}}
\newcommand\argmin{{\mathrm{argmin}}}
\newcommand\Err{{\mathtt{NE}}}
\newcommand\eps{{\epsilon}}
\newcommand\bfone{{\mathbf{1}}}
\newcommand\Var{{\mathbf{Var}}}
\newcommand\stdev[1]{{\mathbf{stdev(#1)}}}
\newcommand\cgf{{\mathbf{CGF}}}
\newcommand\bfone{{\mathbb{1}}}
\newcommand\Iden{{\mathbf{I}}}
\newcommand\Prob{{\mathbf{Pr}}}
\DeclareMathOperator\subg{\mathbf{subG}}
\newcommand\cN{{\cal N}}
\newcommand\cS{{\cal S}}
\newcommand\cD{{\cal D}}
\newcommand\cM{{\cal M}}
\newcommand\cE{{\cal E}}
\newcommand\cC{{\cal C}}
\newcommand\KJL{K_{\mathbb{JL}}}
\newcommand\tO{{\tilde O}}
\newcommand\tx{{\tilde{\mathbf{x}}}}
\newcommand\tX{{\tilde{\mathbf{X}}}}
\newcommand\tY{{\tilde{\mathbf{Y}}}}
\newcommand\tA{{\tilde{\mathbf{A}}}}
\newcommand\bA{{\mathbf{A}}}
\newcommand\ba{{\mathbf{a}}}
\newcommand\bw{{\mathbf{w}}}
\newcommand\bW{{\mathbf{W}}}
\newcommand\bR{{\mathbf{R}}}
\newcommand\bB{{\mathbf{B}}}
\newcommand\bQ{{\mathbf{Q}}}
\newcommand\bZ{{\mathbf{Z}}}
\newcommand\bz{{\mathbf{z}}}
\newcommand\bU{{\mathbf{U}}}
\newcommand\bV{{\mathbf{V}}}
\newcommand\bX{{\mathbf{X}}}
\newcommand\bS{{\mathbf{S}}}
\newcommand\hS{\hat {\mathbf{S}}}
\newcommand\bs{{\mathbf{s}}}
\newcommand\bx{{\mathbf{x}}}
\newcommand\by{{\mathbf{y}}}
\newcommand\bb{{\mathbf{b}}}
\newcommand\bT{{\mathbf{T}}}
\newcommand\bP{{\mathbf{P}}}
\newcommand\bG{{\mathbf{G}}}
\newcommand\bY{{\mathbf{Y}}}
\newcommand\hY{\hat {\mathbf{Y}}}
\newcommand\bH{{\mathbf{H}}}
\newcommand\bD{{\mathbf{D}}}
\newcommand\be{{\mathbf{e}}}
\newcommand\br{{\mathbf{r}}}
\newcommand\bSig{{\mathbf{\Sigma}}}
\newcommand\bPi{{\mathbf{\Pi}}}
\newcommand\hphi{{\hat\phi}}
\)
Sketching for Matrix Computations:
Sparse Embeddings
Ken Clarkson
IBM Research, Almaden
- These slides at
http://kenclarkson.org/G2S3_2015/
- Enable javascript, navigate as in powerpoint
- Math may take some time to render properly; if it doesn't end up looking right, try reloading
Faster embeddings: countsketch
A sketching matrix of i.i.d. sub-Gaussians is good but slow:
- `\bS ` is an embedding with high probability
- Computing `\bS\bA ` takes `\Omega(mnd)` time
We will show a faster scheme, and use the reduction
embedding `\rightarrow` mat mult `\rightarrow` single estimate
to analyze it.
Sparse Embeddings
- Adaptation of
CountSketch
from streaming algorithms
- `\bS ` looks like:
$\qquad\qquad\left[\begin{matrix}
0 & -1 & 0 & 0 & +1 & 0
\\ +1 & 0 & 0 & +1 & 0 & 0
\\ 0 & 0 & +1 & 0 & 0 & -1
\\ 0 & 0 & 0 & 0 & 0 & 0
\end{matrix}
\right]$
- One random `\pm 1` per column
- Row `\bA_{i*}` of `\bA ` contributes `\pm\bA_{i*}` to one row of `\bS\bA `
- `m= \tO(d^2)`
Sparse embeddings, more precisely
For `i\in [n]`,
pick uniformly and independently:
- `h_i\in [m]`
- `s_i\in\{+1,-1\}`
Define `\bS\in\R^{m\times n}` by
$$\bS_{h_i, i}\gets s_i, \text{ for } i\in [n],$$
and `\bS_{j,i} \gets 0` otherwise
The `s_i` are "sign flips"
The array `h` hashes to `m` "hash buckets"
That is,
$$\bS_{j,*} = \sum_{i:\,h_i=j} s_i \be_i^\top,$$
so
$$ [\bS\bA]_{j,*} = \sum_{i:\,h_i=j} s_i \be_i^\top \bA = \sum_{i:\,h_i=j} s_i \bA_{i,*}$$
Sparse Embeddings, More
- For `\by = \bA\bx`, each row of sparse embedding `\bS `:
- Collects a subset of entries of `\by ` (into a "bucket")
- Applies random signs
- Adds
- Result is `\E[\norm{\bS\by}] = \norm{\by }_2^2`:
- Each bucket squared `\approx` squared norm of its entries
- Splitting into buckets separates big entries
- With `n` buckets, don't quite get `\norm{\by }_2^2` exactly, always
- (Entry `\rightarrow` Bucket is not one-to-one)
Analysis of sparse embeddings, 1/3
From the analysis via matrix multiplication,
it's enough to analyze, for unit vector `\by`,
$$\begin{align*}
\E[(\norm{\bS\by }^2-1)^2]
& = \E[\norm{\bS\by}^4] - 2\E[\norm{\bS\by}^2] + 1
\\ & = \E[\norm{\bS\by}^4] - 1.
\end{align*}
$$
Let `\bz = \bS\by`.
$$\begin{align*}
\E_s[ & (\norm{\bS\by }^2 - 1)^2] + 1
= \E_s[\norm{\bz}^4]
\\ & = \E_s\left[\left[\sum_j \bz_j^2\right]^2\right]
= \sum_{j,j'} \E_s[\bz_j^2\bz_{j'}^2]
\\ & = \sum_{j,j'} \E_s[\bz_j^2]\E_s[\bz_{j'}^2] + \sum_j \E_s[\bz_j^4] - \E_s[\bz_j^2]^2
\\ & \le 1 + \sum_j \E_s[\bz_j^4]
\end{align*}
$$
Analysis of sparse embeddings, 2/3
We have
$$\bz_j = [\bS\by]_j = \sum_{i:\,h_i=j} s_i \by_i,$$
so
$$\sum_j \E_s[\bz_j^4] = \sum_{i,i',i'',i''':\, h_i = h_{i'} = \ldots = j} \E_s[s_i\by_i s_{i'}\by_{i'} s_{i''}\by_{i''} s_{i'''}\by_{i'''}]$$
which for `i\ne i'` is zero unless
`i''=i` and `i'''=i'`, or the reverse. So
$$\E_{s,h}[\bz_j^4]
\le\frac{1}{m^2}\left[ 2\sum_{i\ne i'}\by_i^2\by_{i'}^2 +\sum_{i,i''}\by_i^2\by_{i''}^2\right]
\le\frac{3}{m^2}\norm{\by }^4 = \frac{3}{m^2}.
$$
and
$$\E[(\norm{\bS\by }^2 - 1)^2]\le\sum_{j\in[m]}\E_{s,h}[\bz_j^4]\le\frac{3}{m}.$$
Analysis of sparse embeddings, 3/3
We have for `\bS ` a sparse embedding, and unit `\by `,
$$\norme{\Err(\by )}_{2,\bS } = \E[(\norm{\bS\by }^2 - 1)^2]^{1/2}\le K/\sqrt{m},$$
for constant `K`.
By the bound for embedding via matrix multiplication,
If `K/\sqrt{m}\le\beta\delta^{1/p}`,
then `\bS ` is a `(\beta r)`-embedding with failure probability `\delta`.
Here `r=\rank(\bA)`.
With `\delta = 1/9`, `\eps/r = \beta\ge 3K/\sqrt{m}` yields
For
- `\eps\in (0,1)`
- `r\equiv\rank(\bA )`
There is
- `m=O(r^2/\eps^2)`
- `\bS\in\R^{m\times n}` countsketch matrix
such that
with failure probablity `1/9`, `\bS ` is an `\eps`-embedding of `\colspan(\bA )`.
Slightly less sparse embeddings
So far: one nonzero per column.
There are sparse embeddings where
the sketch dimension `m` is smaller, and
the number `s` of nonzeros is slightly larger.
There are sparse subspace `\eps`-embedding matrices `\bS\in\R^{m\times n}`
with `sn` nonzeros for matrices `\bA\in\R^{n\times d}` of rank `r`, such
that
`s=O(\polylog(r)/\eps)` and `m=\tO(r/\eps^2)`, or
`s=O(1/\eps)` and `m=O(r^{1+\gamma}/\eps^2)` for any constant `\gamma \gt 0`.
In practice, small `s\gt 1` has better performance [D15].
Hybrid approaches are also possible, resulting in similar
performance for some problems.