
Tighter Bounds for Random Projections of Manifolds
Ken Clarkson
IBM Almaden

Overview
 Dimensionality reduction
 Random projection
 For finite sets, for infinite sets
 Smooth manifolds
 Complexity measures of manifolds
 The "tighter bound"
 Parts of the proof
 Distance measures, dimension, measure, `epsilon`nets
 Reduction theorem: from infinite sets to finite sets
 Application of the reduction theorem to smooth manifolds

Dimensionality Reduction
 Given a highdimensional dataset, in `\RR^m`, map it to a lowerdimensional space
 Maybe just drop some coordinates
 Some dimensions are features, others are not
 Or, rotate the data first, then drop coordinates, or do something even more complicated
 SVD (PCA, LSI, EigenFace*), SDP, ICA, MDS, ETC
 *See also: EigenEyebrow, EigenEye, EigenNose, EigenMouth, EigenHead....
 EigenHand, EigenBody, EigenHeart...
 EigenSign, EigenImage, EigenFish, EigenForm, EigenTracking, EigenWindow, EigenGait,
EigenLightField, EigenSurface, EigenFeature, Eigen Lightfield, EigenScaleSpace, Eigen Nodule, EigenProsody,
EigenShape, EigenTree, EigenEdge, EigenEdginess, EigenHills, Eigen (grapefruit) stems,
EigenCharacter, EigenSignature, EigenWord, EigenSign, EigenLetter, EigenScrabble**
 **Not: EigenCluster, EigenMonkey

Random projection
 Procedure:
 Apply a random rotation to `v in RR^m`
 Drop all but `k` coordinates
 Scale (multiply by a scalar) appropriately
 Equivalently: pick a random subspace of dimension `k`, project `v` onto it, then scale
 JohnsonLindenstrauss (JL) Lemma: this preserves length, approximately:
 Let a `k`map `bb P` be a random projection from `\RR^m` to `\RR^k`, as above
 If `k >= epsilon^{2} C log (1//delta)`, then with probability at least `1delta`,
`(1epsilon)v le bb P v le (1+epsilon)v`
 Since `bb P` is linear, `alpha bb P v = alpha  bb P v` for `alpha ge 0`, so WLOG `v=1`

From one point to many
 Point drafting: for one vector (point) `v`, the probability of failure is
` \delta le exp(k epsilon^2//C)`
 Finite set drafting: for set `S` of `n` points, probability of failure for all points is
`delta le n exp(k epsilon^2//C)`
 Finite set embedding: for `S  S := {xy quad  quad x,y in S}`,
`delta le n^2 exp(k epsilon^2//C)`
 `k = O( epsilon^{2} log (n//delta))`
 That is, preserving distances
 Angles preserved also [M]

Random projection : why?
 It works
 Existence proof: if a random projection works this well, what if we really try to get a good projection?
 There are many similar algorithms with the same properties
 Multiply by a `k times m` matrix of random `pm 1`, or of Gaussians
 Obliviousness: the random projection is chosen without looking at the data at all
 ...and so is called "universal feature reduction"
 Feature reduction without "feedback": no loops
 Brain may work this way; a recent model of the brain [SOP]:
 Is a "feedforward" neural network
 Uses randomness for feature reduction in a similar way

From many to infinite
 Subspace JL [Sar]: for `d`dimensional linear subspace `F`,
`delta = O(1)^d exp(k epsilon^2//C)`
 Hint:
 It helps that if `x,y in F`, so is `xy`, and so is `alpha x`
 There is a finite subset of `F` so that drafting it `=>` drafting `F`
 "Doubling" JL [AHY][IN]: Bounds for sets in `RR^m` of bounded doubling dimension
 Mostly, additive approximation bounds on distance approximation, not relative
 Manifold JL [AHY][BW], here: for a (smooth, connected)
`d`dimensional manifold,
`delta = O(epsilon^{d} exp(k epsilon^2//C))`
 Main bounds of [AHY] additive, or drafting only

(What's a `d`manifold?)
 A manifold is a set that looks, close up, like a linear subspace
 A curve is a 1manifold
 A surface is a 2manifold in 3 dimensions
 If `f : RR^d > RR^m` is a nice function, `{f(x)  x in P subset RR^d}` is a `d`manifold in `RR^m`
 Given a collection of `m` radio antennas,
if signal strength at point `p` is
`(s_1(p), ..., s_m(p))`,
then `{(s_1(p), ..., s_m(p))  p in RR^2}` is a 2manifold in `RR^m`

(When is the input to a program infinite?)
 It isn't
 But: bounds hold for any finite subset of the infinite set
 So: for a set of `n` points on a manifold, are better when `n` is very large

"All `d`manifolds are not the same"
 If `delta = O(epsilon^{d} exp(k epsilon^2//C))`, or `k = O(\epsilon^{2} (d log(1//\epsilon)+log(1//\delta)))`,
what more is there to say?
 Lowerorder terms matter:

Measures of manifold complexity
 `mu_I(M)` here denotes the surface area
 For a curve (manifold of dimension one): its length
 Determines:
 Expected hits by a random line, when `d=m1`
 Cost of minimum spanning trees, TSP, and other extremal graphs
 Vector quantization bounds
 `mu_{:III:}(M)` here denotes the total absolute curvature
 For a curve, the total turning angle
 Determines:
 Expected number of critical points of a random height function
 A bound on the sum of the Betti numbers

Measures of manifold complexity, II
 `mu_{:II:}(M)` denotes the total root curvature
 Intermediate between `mu_I{M)` and `mu_{:III:}(M)`
 Determines:
 Complexity of best mesh approximating a surface

Manifold JL
 Baraniuk and Wakin result has additional term for `k` of (roughly):
`quad O(epsilon^{2}(d log(m mu_I ( M)//rho)))`
is enough for failure probability `delta`, where:
 `m` is (as before) the ambient dimension
 `1//rho` bounds the maximum (directional) curvature
 My result has additional term (roughly):
`quad O(epsilon^{2}(log(mu_I(M)//tau^d + mu_{:III:}(M) )))`
where:
 `tau(M)` is a lowtorsionpath threshold: if `a,b in M` have `ab le tau`
then there is a lowcurvature or lowtorsion path between them

Why is this an improvement or interesting?
 Removed dependence on ambient dimension `m` entirely
 Roughly, replaced `(1//rho)`, a supremum over `M`,
by a sum `mu_{:III:}(M)` and allowing low torsion as well
 Also showed: can use curvature measure `mu_{:II:}(M)` instead of surface area `mu_{:I:}(M)`
 `mu_{:II:}(M)` can be `ll mu_{:I:}(M)`
 Places "JL complexity" among other properties of `M` bounded by integral measures `mu_X(M)`

Areas of proof: the rest of the talk
 `epsilon`nets, packings, coverings, measures
 Reduction theorem for drafting: from infinite sets to finite sets
 Use a sequence of finite sets to approximate the infinite set
 Use JL for all sets of the sequence
 From drafting to embedding smooth manifolds
 Call `ab`, for `a,b in M` a chord
 Embedding `equiv` drafting all the chords
 Call chords shorter than some parameter, short, otherwise long
 For embedding: hard to handle short chords
 Why some previous results are only additive
 For smooth sets, use tangent vectors to approximate short chords
 Smoothness allows relative bounds for manifolds, instead of additive

`epsilon`nets
 `N subset S` is an `epsilon`net if it is both an:
 `epsilon`covering: every point on `S` is within `epsilon` of some point of `N`
 `epsilon`packing: points in `N` are at least `epsilon` from each other

`epsilon`nets and measures
 `epsilon`net of `d`dimensional set has size `C(S,epsilon) approx mu(S)//epsilon^d`
 This relation can define the dimension and the measure, for different metric spaces:
 Dimension is `d` such that `C(S,epsilon) = 1//epsilon^{:d+o(1):}` as `epsilon > 0`
 `mu(S) approx C(S,epsilon) epsilon^d` as `epsilon > 0`
 On a manifold `M`:
 Arc length `D_I(a,b)` corresponds to surface area `mu_I(M)`
 `D_I(a,b) ge ab =>` `epsilon`net of `M` with respect to Euclidean distance has size `=O(mu_I(M)//epsilon^d)`
 Total curvature `D_{III}` corresponds to total absolute curvature `mu_{:III:}`

Getting to infinite
(Very similar to [AHY], inspired by [IN])
 Theorem: an infinite set `S` of unit vectors is `epsilon`drafted by `k`map:
 That is, with prob. `1  delta`, for all `a in S`, `1  epsilon le bb P a  le 1 + epsilon`
 Where `k` depends on `log\ C(S,\epsilon')`, for `epsilon' le epsilon`
 The proof (omitted here) uses a sequence of `epsilon_i`nets, with `epsilon_i > 0`

From drafting to embedding:
applying the theorem
 To apply the theorem, need `epsilon`nets for the normalized chords
`S = U(MM) := { (ab)/{:ab:}  quad a,b in M}`

From drafting to embedding: long chords
`epsilon`nets for long chords:
 For `U_lambda(MM) := { (ab)/{:ab:}  quad a,b in M, ab ge lambda}`,
 If `N` is an `epsilon lambda`cover of `M` (with respect to Euclidean distance),
 Then `U(NN)` is a `4epsilon`cover of `U_lambda(MM)`

From drafting to embedding: short chords
 `U(NN)` does not approximate the short chords
 Tangent vectors approximate very short chords
 If total curvature from a curve from `b` to `a` is `le epsilon`,
unit tangent vector at `b` is within `O(epsilon)` of `(ab)//ab`
 Implies dependence on upper bound for curvature

From drafting to embedding: short chords, II
 So `epsilon`cover for all unit tangent vectors
`=>epsilon`cover for all `U(ab)`, `a,b in M`
on a low curvature path
 Cover of tangents: pick a collection of tangent subspaces
 Cover `v` by some vector in nearby tangent subspace
 Collection of size `O(mu_{:III:}( M )//epsilon^d)`
 (`epsilon`net in Gauss map image of `M` in Grassmann manifold)

Short chords, tangents, reach
 Suppose `a,b in M` very close in Euclidean distance, but very far in arc length
 Then tangent at `a` or `b` has nothing to do with `ab`
 This can happen when the reach `rho` of `M` is small
 Smallest distance of point `p` to `M`, when `p` has two nearest neighbors in `M`
 A.K.A., reciprocal condition number of `M`
 Can dependence on reach be avoided?

Tangent vectors `approx` short chords : planar curves
 How else to approximate short chords with tangent vectors of `M`?
 When `a,b in M` are connected by a planar curve in `M`,
that curve has a tangent vector parallel to `ab`
 "Planar" := contained in a plane (2flat).
 When `M` is a pure quadric, every `a,b in M` connected by a planar curve

Tangent vectors `approx` short chords:
lowtorsion curves
 The condition "connected by a planar curve" is too strong
 What if `a,b in M` are connected by an "almost planar" curve?
 Is there a tangent vector almost parallel" to `ab`?
 Yes, when "almost planar" is: total torsion of curve is small
 Torsion == twisting of osculating plane
 Curve with total torsion 0 is planar
 So: `a,b in M` connected by lowcurvature or lowtorsion curve in `M`
 `=>`nearby tangent vector
 `=>`cover by tangent subspaces
 `=>` short chords covered
 `=>` bound for projection dimension `k`

Concluding Remarks
 Results here can only be an upper bound
 Easy to understand effect of Gaussian perturbations of subset
 Newer, similar schemes to JL [AC] also work: need only be linear, and preserve vector length w.h.p
 Better bound would be based entirely on average conditions
 Probably extendible to polyhedral manfolds
 Can we tell if projection dimension `k` is right one?
 For example, via statistical tests on sample
Thank you for your attention

1024 by 764 Screen Size