Processing math: 0%
  1. Geometric Optimization
    and/or/in
    Machine Learning,
    and Related Topics

    Ken Clarkson
    IBM Almaden Research


    Summer SchoolKickoff Workshop
  2. CG v. ML

    VC dimensionVC dimension
    JL/sketchesJL/hash kernels
    iterative re-weightingmultiplicative weights
    "random sampling"sample compression
    coresetssparse greedy approximation
    backwards analysisdeletion estimates
    surface reconstructionmanifold inference
    k-NN graphk-NN graph
    polytope approximationnon-negative matrix factorization
    Minimum Enclosing BallSupport Vector Data Description
    Polytope distanceTraining linear classifiers
    low-to-medium dimensionhigh dimension
    approximation errorclassification error (risk)
    run-time w.r.t. n and \epsilonrun-time w.r.t. risk

  3. Outline

  4. Some Geometric Optimization Problems

  5. Quadratic Programming in the Simplex

  6. QP in the Simplex, Weak Duality

  7. QP \rightarrow Perceptron

  8. QP \rightarrow MEB

  9. A Simple Algorithm for MEB

    x_1 \leftarrow A_1
    for t=1,\ldots,T:
      i_t \leftarrow \argmax_{i\in[n]} \norm{x_t - A_i}   // Farthest from current
      x_{t+1} \leftarrow \sum_{\tau\in [t]} A_{i_\tau}/t            // Centroid of farthest points
    return \overline{x}\leftarrow \sum_{t\in [T]} x_t/T.

    The Frank-Wolfe algorithm, more or less.
  10. Demo

  11. Analysis: Regret

  12. Analysis, using Regret

    \eqalign{ R_{A'}^2 & = \min_x \max_{t\in [T]} c_t(x) \\ & \ge \min_x \frac{1}{T}\sum_{t \in [T]} c_t(x) & \quad \textrm{(max }\ge \textrm{ average)} \\ & \ge \frac{1}{T}\sum_{t \in [T]} c_t(x_t) - O(\log T)/T & \quad \textrm{(regret bound)} \\ & \ge \frac{1}{T}\sum_{t\in [T]} \norm{x_t - A_i}^2 - O(\log T)/T & \quad \textrm{(for each } i\in[n]) \\ & \ge \norm{\overline{x} - A_i}^2 - O(\log T)/T & \quad \textrm{(convexity of } \norm{x-A_i}^2) \\ }
  13. Analysis, Wrapup

  14. Related Work: Coresets,
    Low-Rank Approximation


  15. Coresets, via Regret

  16. Online Convex Optimization, More Generally

  17. Online Optimization via "Fictitious Play"

  18. RFP \rightarrow Gradient Descent

  19. RFP \rightarrow Multiplicative Weights

  20. Pegasos, Algorithm Sketch

  21. Pegasos, Analysis Sketch

  22. Faster Distance Evaluations for MEB Algorithm?

  23. Faster Distance Evaluations: JL

  24. Faster Crude Distance Estimates: Sampling

  25. Estimating x^\top A

  26. Tail Estimate from Bernstein

  27. Applying Sampling and Bernstein

  28. \ell_2 Sampling

  29. \ell_2 Sampling


    Pick j with probability \rho(j) proportional to \color{green}{x(j)}^2
    Return estimate row vector v\leftarrow \textrm{Clip}(\frac{1}{\rho(j)} x(j) A_{j:}, 2\epsilon)
  30. Clipped Bernstein

  31. Other Concentrated Estimators
    Using Only Variance Bounds


  32. \ell_2 Sampling as Low-Rent Feature Selection

  33. Kernelization

  34. Matrix-Matrix Multiplication via Sampling

  35. Matrix Bernstein

  36. Isotropic Sparsification

  37. Istropic Sparsification

  38. Spectral Sparsification

  39. What Can Be Estimated
    if D is Known


  40. Finding the Weights?

  41. D as Almost-a-Coreset for MEE

  42. D as Almost-a-Coreset for MEE, II

  43. D as Almost-a-Coreset for MEE, III

  44. A Primal-Dual MEB Algorithm

  45. Max p vs. Min x

    For rounds t=1,\ldots T, two players respond to each other with p_t and x_t
  46. A Primal-Dual Algorithm

    The resulting algorithm, in pseudo-code:
    p_1\leftarrow (\frac{1}{n}, \frac{1}{n}, \ldots, \frac{1}{n})x_1 \leftarrow A p_1;
    for t=1,\ldots,T-1:
       for each i: p_{t+1}(i) \leftarrow p_t(i)\exp(\eta||x_{t} - A_i||^2)
       p_{t+1}\leftarrow p_{t+1}/||p_{t+1}||_1
       x_{t+1} \leftarrow (1-\frac{1}{t})x_t + \frac{1}{t}A p_{t}
          // So x_{t+1} = A \sum_{\tau\in [t]} p_{\tau}/t
    return \overline{x} \leftarrow \sum_{t\in[T]} x_t/T;
  47. The Primal-Dual Algorithm is Not Too Bad

  48. Randomizing the Primal-Dual Algorithm

  49. The Randomized Primal-Dual Algorithm is
    Not Too Bad


  50. Randomized vs. Deterministic

  51. Related Work: Prior Primal-Dual Algorithms

  52. Concluding Remarks