1. Geometric Optimization
    and/or/in
    Machine Learning,
    and Related Topics

    Ken Clarkson
    IBM Almaden Research


    Summer SchoolKickoff Workshop
  2. CG v. ML

    VC dimensionVC dimension
    JL/sketchesJL/hash kernels
    iterative re-weightingmultiplicative weights
    "random sampling"sample compression
    coresetssparse greedy approximation
    backwards analysisdeletion estimates
    surface reconstructionmanifold inference
    `k`-NN graph`k`-NN graph
    polytope approximationnon-negative matrix factorization
    Minimum Enclosing BallSupport Vector Data Description
    Polytope distanceTraining linear classifiers
    low-to-medium dimensionhigh dimension
    approximation errorclassification error (risk)
    run-time w.r.t. `n` and `\epsilon`run-time w.r.t. risk

  3. Outline

  4. Some Geometric Optimization Problems

  5. Quadratic Programming in the Simplex

  6. QP in the Simplex, Weak Duality

  7. QP `\rightarrow` Perceptron

  8. QP `\rightarrow` MEB

  9. A Simple Algorithm for MEB

    `x_1 \leftarrow A_1`
    for `t=1,\ldots,T`:
      `i_t \leftarrow \argmax_{i\in[n]} \norm{x_t - A_i}`   // Farthest from current
      `x_{t+1} \leftarrow \sum_{\tau\in [t]} A_{i_\tau}/t`            // Centroid of farthest points
    return `\overline{x}\leftarrow \sum_{t\in [T]} x_t/T`.

    The Frank-Wolfe algorithm, more or less.
  10. Demo

  11. Analysis: Regret

  12. Analysis, using Regret

    `\eqalign{ R_{A'}^2 & = \min_x \max_{t\in [T]} c_t(x) \\ & \ge \min_x \frac{1}{T}\sum_{t \in [T]} c_t(x) & \quad \textrm{(max }\ge \textrm{ average)} \\ & \ge \frac{1}{T}\sum_{t \in [T]} c_t(x_t) - O(\log T)/T & \quad \textrm{(regret bound)} \\ & \ge \frac{1}{T}\sum_{t\in [T]} \norm{x_t - A_i}^2 - O(\log T)/T & \quad \textrm{(for each } i\in[n]) \\ & \ge \norm{\overline{x} - A_i}^2 - O(\log T)/T & \quad \textrm{(convexity of } \norm{x-A_i}^2) \\ }`
  13. Analysis, Wrapup

  14. Related Work: Coresets,
    Low-Rank Approximation


  15. Coresets, via Regret

  16. Online Convex Optimization, More Generally

  17. Online Optimization via "Fictitious Play"

  18. RFP `\rightarrow` Gradient Descent

  19. RFP `\rightarrow` Multiplicative Weights

  20. Pegasos, Algorithm Sketch

  21. Pegasos, Analysis Sketch

  22. Faster Distance Evaluations for MEB Algorithm?

  23. Faster Distance Evaluations: JL

  24. Faster Crude Distance Estimates: Sampling

  25. Estimating `x^\top A`

  26. Tail Estimate from Bernstein

  27. Applying Sampling and Bernstein

  28. `\ell_2` Sampling

  29. `\ell_2` Sampling


    Pick $j$ with probability `\rho(j)` proportional to `\color{green}{x(j)}^2`
    Return estimate row vector `v\leftarrow \textrm{Clip}(\frac{1}{\rho(j)} x(j) A_{j:}, 2\epsilon)`
  30. Clipped Bernstein

  31. Other Concentrated Estimators
    Using Only Variance Bounds


  32. `\ell_2` Sampling as Low-Rent Feature Selection

  33. Kernelization

  34. Matrix-Matrix Multiplication via Sampling

  35. Matrix Bernstein

  36. Isotropic Sparsification

  37. Istropic Sparsification

  38. Spectral Sparsification

  39. What Can Be Estimated
    if `D` is Known


  40. Finding the Weights?

  41. `D` as Almost-a-Coreset for MEE

  42. `D` as Almost-a-Coreset for MEE, II

  43. `D` as Almost-a-Coreset for MEE, III

  44. A Primal-Dual MEB Algorithm

  45. Max `p` vs. Min `x`

    For rounds `t=1,\ldots T`, two players respond to each other with `p_t` and `x_t`
  46. A Primal-Dual Algorithm

    The resulting algorithm, in pseudo-code:
    `p_1\leftarrow (\frac{1}{n}, \frac{1}{n}, \ldots, \frac{1}{n})`;  `x_1 \leftarrow A p_1`;
    for `t=1,\ldots,T-1`:
       for each `i`: `p_{t+1}(i) \leftarrow p_t(i)\exp(\eta||x_{t} - A_i||^2)`
       `p_{t+1}\leftarrow p_{t+1}/||p_{t+1}||_1`
       `x_{t+1} \leftarrow (1-\frac{1}{t})x_t + \frac{1}{t}A p_{t}`
          // So `x_{t+1} = A \sum_{\tau\in [t]} p_{\tau}/t`
    return `\overline{x} \leftarrow \sum_{t\in[T]} x_t/T`;
  47. The Primal-Dual Algorithm is Not Too Bad

  48. Randomizing the Primal-Dual Algorithm

  49. The Randomized Primal-Dual Algorithm is
    Not Too Bad


  50. Randomized vs. Deterministic

  51. Related Work: Prior Primal-Dual Algorithms

  52. Concluding Remarks