Rethinking Testing and Benchmarking in Data Systems

Title of the Talk: From Static to Strategic: Rethinking Testing and Benchmarking in Data Systems
Speaker: Dr. Anupam Sanghi
Host Faculty: Ashish Mishra
Date: Wednesday, December 4, 2024
Time: 11:30 am to 1:00 pm
Venue: Online
(https://meet.google.com/ykr-yfiq-tfw)

Abstract:

Data systems are the backbone of modern data-driven computing, powering everything from decision-making processes to critical enterprise applications. However, the current landscape of testing and benchmarking these systems is plagued by a lack of automation and an inability to effectively evaluate systems in real-world customer deployments. This talk focuses on bridging the gap between synthetic benchmarks and real-world performance evaluations, moving beyond static methods toward adaptive and strategic approaches. Specifically, we will discuss a dynamic data generation approach that leverages query execution plans from customer deployments to synthesize data. This enables the creation of synthetic data that replicates customer query processing environments, allowing for more realistic system evaluations. Additionally, we will present our evaluation study examining how effectively language models - a cornerstone of the impending AI-driven data systems - understand and process enterprise data. The study highlights the challenges these models face when transitioning from general-purpose public datasets to the complexity of enterprise data. Together, these efforts contribute to the development of a smarter, automated testing and benchmarking paradigm, essential for ensuring the reliability and robustness of data systems.

Speaker Profile:

Anupam Sanghi is currently a Postdoctoral Researcher in the Systems Group at TU-Darmstadt, Germany, where he co-leads efforts in Machine Learning for Data Engineering. Prior to this, he served as a Research Scientist at IBM Research, Bangalore, in the Data and AI group, where his work led to the filing of two patents. He holds an M.E. and Ph.D. from the Indian Institute of Science, Bangalore, with doctoral research funded by the IBM PhD Fellowship. His Ph.D. work introduced a novel framework for generating synthetic databases that mimic customer query processing environments, which has been warmly received by both academic and industry labs. Before pursuing his Ph.D., Anupam worked as a Technical Project Leader at Huawei Technologies, where he also received the Future Star Award. His research interests lie in Database Systems, and he has presented his work at premier conferences and seminars, including VLDB, EDBT, DASFAA, CODS-COMAD, and the Dagstuhl seminar. Anupam has also led tutorials at ICDE and CODS-COMAD conferences, sharing his expertise and engaging with the community.

Venue: Online
(https://meet.google.com/ykr-yfiq-tfw)