\(
\newcommand\norm[1]{|| #1 ||}
\newcommand{\myvertiii}[1]{{\vert\kern-0.2ex\vert\kern-0.2ex\vert #1 \vert\kern-0.2ex\vert\kern-0.2ex\vert}}
\newcommand{\norme}[1]{{\myvertiii{#1}}}
\newcommand\nnz[1]{{\textrm{nnz}(#1)}}
\newcommand\E{\mathbf{E}}
\newcommand\Dist{\mathbf{Dist}}
\newcommand\reals{{\mathrm{I\!R}}}
\newcommand\R{{\mathrm{I\!R}}}
\newcommand\poly{{\textrm{poly}}}
\newcommand\rank{{\textrm{rank}}}
\newcommand\colspan{{\textrm{range}}}
\newcommand\diam{{\textrm{diam}}}
\newcommand\polylog{{\textrm{polylog}}}
\newcommand\rowspan{{\textrm{rowspan}}}
\newcommand\colbasis{{\textrm{colbasis}}}
\newcommand\argmin{{\mathrm{argmin}}}
\newcommand\Err{{\mathtt{NE}}}
\newcommand\eps{{\epsilon}}
\newcommand\bfone{{\mathbf{1}}}
\newcommand\Var{{\mathbf{Var}}}
\newcommand\stdev[1]{{\mathbf{stdev(#1)}}}
\newcommand\cgf{{\mathbf{CGF}}}
\newcommand\bfone{{\mathbb{1}}}
\newcommand\Iden{{\mathbf{I}}}
\newcommand\Prob{{\mathbf{Pr}}}
\DeclareMathOperator\subg{\mathbf{subG}}
\newcommand\cN{{\cal N}}
\newcommand\cS{{\cal S}}
\newcommand\cD{{\cal D}}
\newcommand\cM{{\cal M}}
\newcommand\cE{{\cal E}}
\newcommand\cC{{\cal C}}
\newcommand\KJL{K_{\mathbb{JL}}}
\newcommand\tO{{\tilde O}}
\newcommand\tx{{\tilde{\mathbf{x}}}}
\newcommand\tX{{\tilde{\mathbf{X}}}}
\newcommand\tY{{\tilde{\mathbf{Y}}}}
\newcommand\tA{{\tilde{\mathbf{A}}}}
\newcommand\bA{{\mathbf{A}}}
\newcommand\ba{{\mathbf{a}}}
\newcommand\bw{{\mathbf{w}}}
\newcommand\bW{{\mathbf{W}}}
\newcommand\bR{{\mathbf{R}}}
\newcommand\bB{{\mathbf{B}}}
\newcommand\bQ{{\mathbf{Q}}}
\newcommand\bZ{{\mathbf{Z}}}
\newcommand\bz{{\mathbf{z}}}
\newcommand\bU{{\mathbf{U}}}
\newcommand\bV{{\mathbf{V}}}
\newcommand\bX{{\mathbf{X}}}
\newcommand\bS{{\mathbf{S}}}
\newcommand\hS{\hat {\mathbf{S}}}
\newcommand\bs{{\mathbf{s}}}
\newcommand\bx{{\mathbf{x}}}
\newcommand\by{{\mathbf{y}}}
\newcommand\bb{{\mathbf{b}}}
\newcommand\bT{{\mathbf{T}}}
\newcommand\bP{{\mathbf{P}}}
\newcommand\bG{{\mathbf{G}}}
\newcommand\bY{{\mathbf{Y}}}
\newcommand\hY{\hat {\mathbf{Y}}}
\newcommand\bH{{\mathbf{H}}}
\newcommand\bD{{\mathbf{D}}}
\newcommand\be{{\mathbf{e}}}
\newcommand\br{{\mathbf{r}}}
\newcommand\bSig{{\mathbf{\Sigma}}}
\newcommand\bPi{{\mathbf{\Pi}}}
\newcommand\hphi{{\hat\phi}}
\)
Sketching for Matrix Computations:
Multiplication and Embedding
Ken Clarkson
IBM Research, Almaden
- These slides at
http://kenclarkson.org/G2S3_2015/
- Enable javascript, navigate as in powerpoint
- Math may take some time to render properly; if it doesn't end up looking right, try reloading
From matrix multiplication
to subspace embedding
Recall that an advantage of oblivious sketching is that
$$
\norm{\bS\bA\bx }\approx_\eps\norm{\bx }\text{ for all }\bx\in\R^n\\
\iff\\
\norm{\bS\bU\bx }\approx_\eps\norm{\bx }\text{ for all }\bx\in\R^n,
$$
where `\bU ` has orthonormal columns
(here assuming `\dim\colspan(\bA )=d`).
Recall that the embedding condition is equivalent to
$$\norm{\bU^\top\bS^\top\bS\bU -\Iden}_2\le\eps.$$
Since `\bU^\top\bU = \Iden`:
A spectral error bound for $$\bU^\top\bS^\top\bS\bU $$ as an estimate
of $$\bU^\top\bU $$ implies a subspace embedding bound.
From subspace embedding
to matrix multiplication
There is also a kind of converse:
if the sketching error of `\bS `
is well-behaved, then
`\bS ` is good for matrix multiplication.
Suppose for
- Sketching matrix `\bS\in\R^{m\times n}`
- Any fixed unit `\by\in\R^n`
- Fixed `p\ge 2`
- `X\equiv |\,\norm{\bS\by }^2-1\,| = \Err_\bS(\by)`
that
$$\E[X^p]^{1/p}\le\frac12\eps\delta^{1/p}.$$
(In particular, sub-Gaussian i.d. `\bS ` satisfy this for `p=2`.)
Then for `\bA\in\R^{n\times d}`, `\bB\in\R^{n\times d'}`,
$$\norm{\bA^\top\bS^\top\bS\bB -\bA^\top\bB }_F\le\eps\norm{\bA }_F\norm{\bB }_F$$
with failure probability `\delta`.
Proof: for `p=2`, about six slides from now.
From embedding to matrix mult, and back
Applying this to $\bU$ an orthonormal basis for `\colspan(\bA )`,
we have
Suppose for
- Sketching matrix `\bS\in\R^{m\times n}`
- Fixed unit `\by\in\R^n`
- Fixed `p\ge 2`
- `X\equiv |\,\norm{\bS\by }^2-1\,|`
that
$$\E[X^p]^{1/p}\le\beta\delta^{1/p}.$$
(In particular, sub-Gaussian i.d. `\bS ` satisfy this for `p=2`.)
Then for `\bA\in\R^{n\times d}` of rank `r`,
and `\bU ` a basis for `\colspan(\bA )`,
$$\norm{\bU^\top\bS^\top\bS\bU -\Iden}_2
\le \norm{\bU^\top\bS^\top\bS\bU -\Iden}_F
\le\beta\norm{\bU }_F^2 = \beta r,$$
so that `\bS ` is a `\beta r`-embedding, with failure probability `\delta`.
Moments of sub-Gaussians
and sub-exponentials
The quantity `\E[|X|^p]^{1/p}` grows slowly for sub-Gaussians.
For random variables `X,Y`, let `\norme{X}_{p,Y}\equiv\E_Y[|X|^p]^{1/p}`,
or just `\norme{X}_p` when `Y` is understood.
Moments of sub-G and sub-exp (NPH)
`X` is sub-Gaussian `\iff\norme{X}_{p}\lt K_X\sqrt{p}` for all `p\ge 1`
`X` is sub-exponential `\iff\norme{X}_{p}\lt K_X p` for all `p\ge 1`
(Here `K_X` depends on `X`, independent of `p`.)
- (Gaussian) If `X\sim N(0,1)`, then `\norme{X}_p = K\Gamma(\frac{p+1}2)^{1/p}= \Theta(\sqrt{p})`
- (Sign) If `\mathbf{X}\in\R^n` has `\mathbf{X}_i= \pm 1` uniformly i.i.d,
then for unit `\by\in\R^n`,
`\norme{\mathbf{X}^\top\by }_p = \Theta(\sqrt{p})`
- This is an expression of Khintchine's inequality
The norm `\norme{\cdot}_{p, X}`
The notation is not a coincidence: random variables defined on a probability space
with bounded `p`-th moments are a normed linear space, a subspace of `L_p`.
$$\norme{X+Y}_{p}\le\norme{X}_{p} +\norme{Y}_{p}$$
For `\alpha\ge 0`, `\norme{\alpha X}_{p} = \alpha\norme{X}_{p}`
`\Prob\{|X|\ge t\}\le\norme{X}_p^p/t^p`.
cf. the concentration of sub-Gaussians;
for `p=1`, this is Markov's inequality, and for `p=2`, Chebyshev
From embedding to matrix multiplication:
moment norm of error
Recall that for vectors `a,b`,
`\Err_\bS(\ba,\bb)\equiv\frac{ |\ba^\top\bS^\top\bS\bb -\ba^\top\bb | }{\norm{\ba}\,\norm{\bb}}` and `\Err_\bS(\ba)\equiv\Err(\ba,\ba)`,
Now generalize this to matrices:
`\Err_\bS(\bA ,\bB )\equiv\frac{\norm{\bA^\top\bS^\top\bS\bB -\bA^\top\bB }_F }{\norm{\bA }_F\norm{\bB }_F}` and `\Err_\bS(\bA )\equiv\Err_\bS (\bA ,\bA )`.
For fixed `\bA `, `\bB `, `p\ge 2` and `\gamma\equiv \sup_\by\norm{\Err(\by )}_{p,\bS }`,
$$\norm{\Err(\bA ,\bB )}_{p,\bS }\le 2\gamma.$$
Proof (for `p=2` only):
Recall that `\Err(\ba,\bb)\le\Err(\frac{\ba}{\norm{\ba}}+\frac{\bb}{\norm{\bb}}) +\Err(\frac{\ba}{\norm{\ba}}-\frac{\bb}{\norm{\bb}})`;
for given `\ba,\bb`, using `\norme{\cdot}_2` triangle inequality,
$$\E[\Err(\ba,\bb)^2]^{1/2} = \norme{\Err(\ba,\bb)}_2\le 2\gamma.$$
From embedding to matrix multiplication:
dot product to `F`-norm
Letting `\Err_{ij}\equiv\Err(\bA_{*i},\bB_{*j})`
and `\mathbf{H}\equiv\bA^\top\bS^\top\bS\bB -\bA^\top\bB `,
$$\norm{\mathbf{H}}_F^2 = \sum_{i\in[d], j'\in[d']}\mathbf{H}_{ij}^2 = \sum_{i\in[d],j\in[d']}\norm{\bA_{*i}}^2\norm{\bB_{*j}}^2\Err_{ij}^2.$$
We have
$$
\begin{align*}
\E[\norm{\mathbf{H}}_F^2]
& = \sum_{i\in[d],j\in[d']}\norm{\bA_{*i}}^2\norm{\bB_{*j}}^2\E[\Err_{ij}^2]
\\ & \le\sum_{i\in[d],j\in[d']}\norm{\bA_{*i}}^2\norm{\bB_{*j}}^2 (2\gamma)^2
= 4\gamma^2\norm{\bA }_F^2\norm{\bB }_F^2
\end{align*}
$$
So $$\norme{\Err(\bA ,\bB )}_2 = \E[\norm{\mathbf{H}}_F^2]^{1/2}/\norm{\bA }_F\norm{\bB }_F\le 2\gamma,$$
as claimed, for `p=2`.
For `p>2`, use `\norme{X}_p^p = \norme{X^2}_{p/2}`, and the triangle inequality for `\norme{\cdot}_{p/2}`
From embedding to matrix multiplication:
`F`-norm to tail
We can now apply the moment Markov inequality
with
$$\norme{\Err(\bA ,\bB )}_p\le\eps\delta^{1/p}\text{ and } t = \eps,$$
so that
$$\Prob\{\Err(\bA ,\bB )\ge t\}\le\norme{\Err(\bA ,\bB )}_p^p/t^p\le (\eps\delta^{1/p})^p/\eps^p = \delta$$
to obtain the matrix multiplication result
$$
\begin{align*}
\sup_\by \norme{\Err(\by )}_p\le\frac12\eps\delta^{1/p}
& \implies\norme{\Err(\bA ,\bB )}_p\le\eps\delta^{1/p}
\\ & \implies\Prob\{\Err(\bA ,\bB )\ge\eps\}\le\delta.
\end{align*}
$$
Further Reading
The matrix multiplication results are cribbed from
Jelani Nelson's lecture notes.
See also:
Daniel M. Kane, Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. SODA, 1195-1206, 2012.