In this talk, I will survey work on using linear sketching for solving fundamental graph problems such as finding matchings, evaluating edge and node connectivity, constructing graph sparsifiers, correlation clustering, and estimating the number of triangles. Time permitting, we will also discuss other combinatorial problems such as set-cover and sub-modular maximization.
I will discuss a tool called sketching, which is a form of data dimensionality reduction, and its applications to several problems in high dimensional geometry. In particular, I will show how to obtain the fastest possible algorithms for fundamental problems such as projection onto a flat, and also study generalizations of projection onto more complicated objects such as the union of flats or subspaces. Some of these problems are just least squares regression problems, with many applications in machine learning, numerical linear algebra, and optimization. I will also discuss low rank approximation, with applications to clustering. Finally I will mention a number of other applications of sketching in machine learning, numerical linear algebra, and optimization.
Coffee break
In this talk I will describe a new study of linear sketching that focuses on understanding the power of linear sketches based on parities (i.e. over GF_2, the field of two elements, as compared to the previous work that uses real arithmetic). I will illustrate various properties of such sketches using Fourier-analytic methods and tools from communication complexity. In particular, linear sketching over GF_2 turns out to be closely related to Fourier sparsity with respect to Lp-norms. Moreover, it can be shown to be optimal in streaming and distributed settings for data generated according to the uniform distribution.
Joint work with Sampath Kannan (UPenn), Elchanan Mossel (MIT) and Swagato Sanyal (NUS).
Lunch (on your own)
Linear sketches have proven quite successful in recent years for sketching graphs in various problems including edge and vertex connectivity, cut and spectral sparsifiers, densest subgraph, subgraph counting, and many others. These sketches allow for compressing an input graph, through a linear projection, to a small-size summary, typically proportional to the number of vertices in the graph, i.e., quadratically smaller than the input size, while preserving the desired properties of the graphs to within an arbitrary precision.
In this talk, we consider linear sketches for the maximum matching problem. We show that any linear sketch that can provide even a modest approximation ratio of n^{o(1)} to the largest matching requires n^{2-o(1)} space, i.e., essentially no better linear sketch exist for the matching problem than simply storing the entire input graph. We further consider the problem of estimating the size of a maximum matching as opposed to finding the actual edges in the matching and show that obtaining a (1+eps)-approximation to this problem requires linear sketches of almost quadratic space, i.e., n^{2-O(eps)} bits. A well-known connection between matching size estimation and computing rank of Tutte matrices allows us to further carry our lower bound results to the matrix rank estimation problem and we show that almost quadratic space is also necessary for any linear sketch that can approximate the rank of a matrix to within an arbitrary precision.
In this talk we will discuss universal sketches for symmetric norms and G-sum functions and their applications to software defined networks.
First, we will discuss the streaming space complexity of symmetric norm L (a norm on R^n invariant under sign-flips and coordinate-permutations), by relating this space complexity to the measure-concentration characteristics of L. Specifically, we provide nearly matching upper and lower bounds on the space complexity of calculating a (1\pm \epsilon)-approximation to the norm of the stream. (The bounds match up to polylogarithmic factors.) All of the bounds depend on the median of L(x) when x is drawn uniformly from the Euclidian unit sphere. The same median governs many phenomena in high-dimensional spaces, such as large-deviation bounds and the critical dimension in Dvoretzky's Theorem. The family of symmetric norms contains several well-studied norms, such as all lp norms, and indeed we provide a new explanation for the disparity in space complexity between p≤2 and p>2 . In addition, we apply our general results to easily derive bounds for several norms that were not studied before in the streaming model, including the top-k norm and the k-support norm, which was recently employed for machine learning tasks. We design a generic algorithm for symmetric norms and it is based on a universal sketch. Specifically, for every s > 0, there is a single sketch of size s*poly(log(n)/\epsilon), that yields a (1 \pm \epsilon)-approximation for every symmetric norm whose streaming space complexity is at most s.
Second, we will discuss the space complexity of approximating \sum_{i=1} g(|fi|), (a G-sum), where {f_i}_{i\in [n]} is the frequency vector of a turnstile stream. This is a generalization of the well-known frequency moments problem, and previous results apply only when g is monotonic or has a special functional form. Our contribution is to give a condition such that, except for a narrow class of functions g, there is a space-eﬃcient approximation algorithm for the sum if and only if g satisﬁes the condition. The functions g that we are able to characterize include all convex, concave, monotonic, polynomial, and trigonometric functions, among many others, and is the ﬁrst such characterization for non-monotonic functions. Thus, for nearly all functions of one variable, we answer the open question from the celebrated paper of Alon, Matias and Szegedy (1996). Our sketch for the G-sum functions is universal as well.
The talk is based on a joint work with Jaroslaw Blasiok (Harvard), Steve Chestnut (ETH Zurich), Robert Krauthgamer (Weizmann), and Lin F. Yang (Princeton) (STOC 2017) and a joint work with Steve Chestnut, David Woodruff (CMU), and Lin F. Yang (PODS 2016).
We give the first oblivious dimensionality reduction method for the overconstrained Tukey (bisquare) regression problem. The Tukey loss function is quadratic, like least squares, for residual errors smaller than a prescribed threshold tau, but it becomes constant for errors above tau. This makes the Tukey measure more robust to outliers than least squares, as well as other measures such as least absolute deviation or Huber which have at least linear growth. Given an instance min over x in R^d of || Ax − b ||_T of overconstrained Tukey regression, we show how to reduce it to a smaller problem min over x in R^d of || SAx − Sb ||_T using a sketching matrix S, such that the solution of a (weighted) Tukey regression problem gives an O(log n)-approximation to the original problem. By slightly modifying the estimator on the small problem, we achieve a C-approximation for a constant C >= 1. Importantly, S has only poly(d log n) rows and SA and Sb can be computed in nnz(A) log n time, where nnz(A) is the number of non-zero entries of A. Since we reduce Tukey regression to a smaller version of itself, any algorithm or heuristic for the original problem can now be run on the much smaller poly(dlogn)-sized problem. Our mapping S is simple and easy to implement, and we give empirical results demonstrating its practicality. Our work opens up the possibility of using sketching for a much broader set of non-convex loss functions previously not amenable to the “sketch and solve” paradigm, a paradigm that has proven to be very successful for solving regression with respect to other loss functions such as the L_p-norms.
Joint work with David Woodruff.
Finding heavy hitters in a data stream, also known as elephants or frequent items, is one of the most practically important and core problems in the study of streaming algorithms. The most basic form of the problem is simple: given a long stream of elements coming from some large universe, the goal is to report all frequent items in the stream using an algorithm with very low memory consumption, much smaller than both the stream length and universe size. In the theoretical study of streaming algorithms, finding heavy hitters has played a key role as a subroutine in solving many other problems.
In this talk we present a new heavy hitters algorithm, ExpanderSketch, which reduces the heavy hitters problem to a new problem we define concerning graph clustering, which is an area of interest in its own right. Our special goal is to identify all clusters of a particular type, even if they are much smaller than the graph we are asked to find them in. While our algorithm as presented is rather theoretical with large constants, the ideas in it are simple with potential practical impact. The particular formulation we focus on is finding clusters with low external conductance in the graph, and with good connectivity properties internally. Our focus in this work is on theoretical results, with constants that may be unacceptable in practice.
This is joint work with Kasper Green Larsen, Jelani Nelson, and Mikkel Thorup.
Coffee break
The problem of approximately computing the $k$ dominant Fourier coefficients of a vector $X$ quickly, and using few samples in time domain, is known as the Sparse Fourier Transform (sparse FFT) problem. A long line of work on the sparse FFT has resulted in algorithms with $O(k\log n\log (n/k))$ runtime [Hassanieh \emph{et al.}, STOC'12] and $O(k\log n)$ sample complexity [Indyk \emph{et al.}, FOCS'14]. These results are proved using non-adaptive algorithms, and the latter $O(k\log n)$ sample complexity result is essentially the best possible under the sparsity assumption alone: It is known that even adaptive algorithms must use $\Omega((k\log(n/k))/\log\log n)$ samples [Hassanieh \emph{et al.}, STOC'12]. By {\em adaptive}, we mean being able to exploit previous samples in guiding the selection of further samples.
This paper revisits the sparse FFT problem with the added twist that the sparse coefficients approximately obey a $(k_0,k_1$)-block sparse model. In this model, signal frequencies are clustered in $k_0$ intervals with width $k_1$ in Fourier space, where $k= k_0k_1$ is the total sparsity. Signals arising in applications are often well approximated by this model with $k_0\ll k$.
Our main result is the first sparse FFT algorithm for $(k_0, k_1)$-block sparse signals with the sample complexity of $O^*(k_0k_1 + k_0\log(1+ k_0)\log n)$ at constant signal-to-noise ratios, and sublinear runtime. A similar sample complexity was previously achieved in the works on {\em model-based compressive sensing} using random Gaussian measurements, but used $\Omega(n)$ runtime. To the best of our knowledge, our result is the first sublinear-time algorithm for model based compressed sensing, and the first sparse FFT result that goes below the $O(k\log n)$ sample complexity bound.
(Joint work with Volkan Cevher, Jonathan Scarlett and Amir Zandieh)
The goal of compressed sensing is make use of image structure to estimate an image from a small number of linear measurements. The structure is typically represented by sparsity in a well-chosen basis. We show how to achieve guarantees similar to standard compressed sensing but without employing sparsity at all -- instead, we suppose that vectors lie near the range of a generative model G: R^k -> R^n.
Our main theorem is that, if G is L-Lipschitz, then roughly O(k log L) random Gaussian measurements suffice for an L_2/L_2 recovery guarantee; this is O(k d log n) for typical d-layer networks. We demonstrate our results using generative models from published variational autoencoder and generative adversarial networks. Our method can use 5-10x fewer measurements than Lasso for the same accuracy.
This is joint work with Ashish Bora, Ajil Jalal, and Alex Dimakis.
Sepehr Assadi is a PhD student in Computer science at University of Pennsylvania, advised by Sanjeev Khanna. His research interests are primarily on theoretical foundations of big data analysis and in particular streaming and distributed algorithms and lower bounds. He is also interested in areas of approximation and online algorithms, communication complexity, and algorithmic game theory. He obtained his B.Sc degree from Sharif University of Technology in 2013. Sepehr has received the best paper awards at WINE 2015 and SPAA 2017, and the best student paper award at PODS 2017.
Vladimir Braverman is an Assistant Professor in the Department of Computer Science in the Whiting School of Engineering at the Johns Hopkins University. He received his PhD from UCLA in 2011. His main research interests include streaming and sketching algorithms. His research has been supported by DARPA, NSF, Google, Cisco, and Nvidia. Braverman received NSF CAREER Award in 2017.
Ken Clarkson has been a Research Staff Member at the IBM Almaden Research Center since 2007, and is manager of the theory group there. He was a Member of Technical Staff at Bell Labs from 1984 until 2007, after receiving his Ph.D. in computer science from Stanford University and BA in mathematics from Pomona College. His research has largely been in geometric algorithms and numerical linear algebra, with a sabbatical on tools for cellular basestation optimization. He was principal investigator for a DARPA project on sketching-based matrix computations, the recipient of Best Paper Awards at the Vehicular Technology Conference 2006 and STOC 2013, and is a Fellow of the ACM. He has held a First-Class Radio-Telephone Operator's License.
Huy Le Nguyen is an Assistant Professor of Computer Science in the College of Computer and Information Science at Northeastern University. Prior to joining Northeastern, he was a Research Assistant Professor at the Toyota Technological Institute in Chicago and before that, a Google Research Fellow at the Simons Institute at University of California, Berkeley. He received his PhD in Computer Science from Princeton University. Professor Nguyen is broadly interested in the design and analysis of algorithms, with an emphasis on algorithmic techniques for massive data sets and machine learning.
Michael Kapralov is an Assistant Professor in the School of Computer and Communication Sciences at EPFL. Michael obtained his Ph.D. from Stanford University in 2012, after which he spent two years as a postdoc in the Theory of Computation Group at MIT CSAIL, and a year as a Herman Goldstine Postdoctoral Fellow at the IBM T.J. Watson Research Center. He is broadly interested in theoretical computer science, with an emphasis on theoretical foundations of big data analysis. Most of his algorithmic work is in sublinear algorithms, where specific directions include streaming, sketching, sparse recovery and Fourier sampling.
Andrew McGregor is an Associate Professor at the University of Massachusetts, Amherst. He received a B.A. degree and the Certificate of Advance Study in Mathematics from the University of Cambridge and a Ph.D. from the University of Pennsylvania. He also spent a couple of years as a post-doc at UC San Diego and Microsoft Research SVC. He is interested in many areas of theoretical computer science and specializes in data stream algorithms, linear sketching, and communication complexity. He received the NSF Career Award in 2010.
David Woodruff is an Associate Professor at Carnegie Mellon University in the School of Computer Science. Prior to that he spent ten years at the IBM Almaden Research Center, which he joined in 2007 after completing his Ph.D. at MIT in theoretical computer science. His research interests include communication complexity, data stream algorithms, machine learning, numerical linear algebra, sketching, and sparse recovery. He is the author of the book "Sketching as a Tool for Numerical Linear Algebra". He is a recipient of the 2014 Presburger Award and Best Paper Awards at STOC 2013 and PODS 2010. At IBM he was a member of the Academy of Technology and a Master Inventor.
Grigory Yaroslavtsev is an assistant professor of Computer Science at Indiana University. Prior to that he was a postdoctoral fellow at the Warren Center for Network and Data Sciences at the University of Pennsylvania. He was previously a Postdoctoral Fellow in Mathematics at Brown University, ICERM. He received his Ph.D. in Theoretical Computer Science in 2014 from Pennsylvania State University and an M.Sc. in Applied Mathematics and Physics from the Academic University of the Russian Academy of Sciences in 2010. Grigory works on efficient algorithms for sparsification, summarization and testing properties of large data, including approximation, parallel and online algorithms, learning theory and property testing, communication and information complexity and private data release.