Optimizing Geo-Distributed Data Analytics
Seminar talk titled Optimizing Geo-Distributed Data Analytics
Title Of the Talk: Optimizing Geo-Distributed Data Analytics
Speaker: Dhruv Kumar
Host Faculty: Dr. Praveen T
Date &Time: Friday, 30th July 2021 11:20 - 13:00 Hrs
Abstract:
Large-scale data analytics services require the collection and analysis of data from end-user applications and devices distributed around the globe. Such analytics requires data to be transferred over the wide-area network (WAN) to data centers (DCs) capable of processing the data. Sending the raw data directly from the user devices to DCs is a tempting solution but highly impractical given the large data volumes. Since WAN bandwidth is expensive and scarce, it is beneficial to reduce WAN traffic by partially processing the data closer to end-users. Enterprises are, thus, increasingly moving towards a geo-distributed edge-cloud infrastructure for performing such analytics as close to the user devices as possible. This talk presents some of the research aimed at utilizing the distributed edge-cloud infrastructure to optimize metrics such as delay, WAN traffic, and monetary cost for analytics services. The proposed solutions are implemented on top of popular analytics engines such as Apache Flink and Apache Spark. Evaluations are carried out using a geo-distributed deployment on Amazon EC2 as well as a WAN-emulated local testbed. Our evaluation using real-world traces from Twitter and Akamai as well as synthetic benchmarks shows significant improvements over existing state-of-the-art approaches.
Speaker Profile:
Dhruv Kumar is a final year Ph.D. Candidate at The University of Minnesota, Twin Cities. His research is centered around building large-scale systems for distributed analytics. His work has been published in venues such as ACM SIGMETRICS, ACM/IEEE SEC, USENIX HotEdge, ACM EdgeSys. Dhruv has been a recipient of 3M Science and Technology fellowship. During his Ph.D., he has also interned with Google Cloud, California. He graduated from BITS, Pilani in 2014 with a bachelor’s in Computer Science. At BITS Pilani, he did research on designing and implementing high-performance algorithms for shared and distributed memory systems. Prior to joining the Ph.D. program at UMN, Dhruv worked on optimizing the data processing pipelines at Goldman Sachs, Bengaluru.
Date:
Friday, 30th July 2021 11:20 - 13:00 Hrs