Distinct Elements in Streams and the Klee's Measure Problem

Title Of the Talk: “ Distinct Elements in Streams and the Klee’s Measure Problem “
Speakers: Prof. Sourav Chakraborty (ISI, Kolkata) Host Faculty: Dr. Nitin Saurabh
date: : October 17 (Thursday); 11:30am – 12:30pm

Abstract: We will present a very simple streaming algorithm on F0 estimation that also caught the eye of Donald E. Knuth. In a recent article, Donald E. Knuth started with the following two paragraphs:

Sourav Chakraborty, N. V. Vinodchandran, and Kuldeep S. Meel have recently proposed an interesting algorithm for the following problem: A stream of elements ( (a_1, a_2, \ldots, a_m) ) is input, one at a time, and we want to know how many of them are distinct. In other words, if ( A = { a_1, a_2, \ldots, a_m } ) is the set of elements in the stream, with multiplicities ignored, we want to know ( |A| ), the size of that set. But we don’t have much memory; in fact, ( |A| ) is probably a lot larger than the number of elements that we can hold in memory at any one time. What is a good strategy for computing an unbiased estimate of ( |A| )?

Their algorithm is not only interesting, it is extremely simple. Furthermore, it’s wonderfully suited to teaching students who are learning the basics of computer science. (Indeed, ever since I saw it, a few days ago, I’ve been unable to resist trying to explain the ideas to just about everybody I meet.) Therefore I’m pretty sure that something like this will eventually become a standard textbook topic. This note is an initial approximation to what I might write about it, if I were preparing a textbook about data streams.”

This simple algorithm comes out of the first ever “efficient” streaming algorithm (from PODS 21) for the Klee’s Measure problem, which was a big open problem in the world of streaming for many years.

This work is based on joint works with N. V. Vinodchandran, and Kuldeep S. Meel across multiple articles, notable the following: Estimating the Size of Union of Sets in Streaming Models. PODS 2021 Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size. PODS 2022 Distinct Elements in Streams: An Algorithm for the (Text) Book. ESA 2022

Speaker Profile:

Sourav Chakraborty is a Professor in Computer Science at the Indian Statistical Institute, Kolkata, India. He finished his Bachelors in Mathematics from Chennai Mathematical Institute in 2003 and then went on to do his Master’s (in 2005) and Ph.D. (in 2008) in Computer Science from the University of Chicago under the guidance of Prof. Laszlo Babai. After doing one year of postdoc at Technion, Israel, and one year of postdoc at CWI Amsterdam, he joined Chennai Mathematical Institute in 2010. He joined the Indian Statistical Institute in 2018. His main area of interest is Theoretical Computer Science with a particular focus on query complexity, property testing, Fourier Analysis of Boolean functions, streaming algorithms, and graph algorithms. Recently he has been using property testing techniques in other more applied areas. Date:
October 17 (Thursday); 11:30am – 12:30pm

Venue
LHC-07 (Lecture Hall Complex)