Communication-aware Heterogeneous 2.5 D System for Energy-efficient LLM Execution at Edge
Title of the Talk: Communication-aware Heterogeneous 2.5 D System for Energy-efficient LLM Execution at Edge
Speaker: Sumit K. Mandal
Host Faculty: Dr. Jyothi Vedurada
Date: Mar 07, 2025
Time: 2:30 PM pm
Venue: CS-LH-03, EECS Building
Abstract: Large Language Models (LLMs) are used to perform various tasks, especially in the domain of natural language processing (NLP). State-of-the-art LLMs consist of a large number of parameters that necessitate a high volume of computations. Currently, GPUs are the preferred choice of hardware platform to execute LLM inference. However, monolithic GPU-based systems executing large LLMs pose significant drawbacks in terms of fabrication cost and energy efficiency. In this work, we propose a heterogeneous 2.5D chiplet-based architecture for accelerating LLM inference. Thorough experimental evaluations with a wide variety of LLMs show that the proposed 2.5D system provides up to 972 improvement in latency and 1600 improvement in energy consumption with respect to state-of-the-art edge devices equipped with GPU.
Bio of the speakerphd: Sumit K. Mandal is currently an assistant professor at Indian Institute of Science, Bangalore. He completed his PhD from the University of Wisconsin-Madison. He received best paper award from ACM TODAES in 2020 and ESweek in 2022. His research interest is energy efficient communication architecture for machine learning applications with emerging technologies.