Adaptive Protocols and Reconfigurable Optical Interconnects for Datacenter Networks.
Title of the Talk: Adaptive Protocols and Reconfigurable Optical Interconnects for Datacenter Network
Speaker: Dr. Vamsi Addanki
Host Faculty: Prof.Praveen T
Date: Feb 25, 2025
Time: 11:00 am
Venue: EE Seminar Hall, EECS Building
Abstract:
Datacenter networks form the backbone of modern computing and storage, driving the expansion of online services and applications. As these networks evolve, the demand for higher bandwidth and lower latency has increased significantly. Recently, GPU clusters have emerged within datacenters for large-scale distributed training, presenting unique networking challenges. This presentation is a brief overview of my prior work that dives into several critical challenges in datacenter networks — congestion control, load balancing, buffer sharing, and reconfigurable optical interconnects — and is structured into three main parts.
First, we address the transport protocol problem in modern datacenters. We propose PowerTCP [1], a novel congestion control algorithm that dynamically adjusts the congestion window based on the bandwidth-window product (or “power”), a new congestion indicator. Through both analytical and empirical validation, we demonstrate PowerTCP’s practicality in datacenter networks, showing that it meets key requirements for high throughput and low latency. We further introduce Ethereal [2], a transport protocol specifically designed for distributed training workloads in CLOS-based topologies. Ethereal achieves optimal load balancing, akin to packet spraying, by minimally splitting a few application flows while maintaining singlepath transport semantics from the network’s perspective. Empirically, we show that Ethereal surpasses state-of-the-art algorithms in collective communication completion times.
Second, we tackle the buffer-sharing problem in datacenter switches. We propose ABM [3], an innovative buffer-sharing algorithm that ensures isolation across different traffic classes while improving burst absorption. As an extension, we introduce Reverie [4], a solution that enables lossy and lossless traffic to coexist within the same network. Reverie preserves isolation between these traffic types while enhancing burst absorption and flow completion times for both. Additionally, we propose Credence [5], the first buffer-sharing algorithm to integrate machine-learned predictions. Our analysis shows that Credence achieves near-optimal throughput under perfect predictions and performs effectively even with imperfect predictions, significantly improving flow completion times in empirical tests.
Finally, as Moore’s law approaches its limits, we address the challenge of high-performance optical interconnects in datacenter networks. We present the first formal result on the throughput of periodic networks, establishing an equivalence to a corresponding static network. Based on this result, we propose Mars [6], a demand-oblivious reconfigurable optical interconnect that achieves near-optimal throughput and low latency across various traffic patterns, even with shallow buffers. Additionally, we introduce Vermilion [7], a demand-aware optical interconnect that dynamically reconfigures according to traffic patterns. Our analysis and empirical results show that Vermilion delivers high throughput across diverse traffic patterns, exceeding the throughput capabilities of demand-oblivious interconnects.
Looking ahead, the presentation concludes with a short overview of the exciting opportunities that optical circuit-switching presents for distributed training in GPU clusters.
Note:
The speaker will present one topic within each of the three parts. However, the speaker is more than happy to discuss any of the topics in greater detail at any time. Please also feel free to interrupt him at any point during the presentation for any questions or discussions.
Bio
Vamsi Addanki is a recent PhD graduate and currently a postdoctoral researcher at the Technical University of Berlin. His passion for computer science and networking has taken him from Sri Chaitanya in Hyderabad to BITS Goa, followed by his Bachelor’s thesis at Télécom Paris. He then pursued a Master’s at Sorbonne University and completed his Master’s thesis at ETH Zurich.