The challenge facing computer systems researchers is to deliver on the so-called, “Moore’s Law of expectation,” that computing systems must get better,cheaper, and faster, at an exponential rate. To date, we have always delivered on this promise, but a number of factors make the current version more challenging. Since the end of Dennard scaling over a decade ago, power and energy have becomede-facto constraints, and led to greatly increased on-chip parallelism in the form of multi-cores, many-cores, GPUs, and accelerators.
In the past, programmers were shielded from details of processor evolution. A single architectural abstraction (the von Neumann machine), and a single programming/algorithmic abstraction (the random access machine RAM) allowed a clean separation of algorithmic,programming and architectural advances. These abstractions are no longer valid, since multi- and many-cores end “La-Z-Boy Programming” approach, where programmers simply wait for the next generation of processors. Programmers, library writers, and application developers must be aware of parallelism and memory locality at multiple levels, or risk losing many of the Moore's Law gains of modern architectures.
However, parallel programming is difficult, especially as architectures are evolving rapidly. More importantly, the associated runtime systems, and programming languages, libraries and frameworks are also evolving. The above evolution raises a number of issues that this course aims to tackle.
  • The short term goal is to learn the methods of writing parallel applications for two classes of modern architectures: multi-core microprocessors, and many-core accelerators (GPUs, Intel Xeon Phi, etc.) Programmers need to learn to develop highly tuned applications that can best exploit the emerging architectures of today.
  • The medium term objective is to do this in a principled manner so that the skills can be easily transferred to other contexts and platforms that the student is likely to encounter in the future.
  • Finally, the long-term goal is to enable foundational research to render the first two challenges moot. This can be achieved through automatic compilation and code generation tools, and will enable the “return to La-Z-Boy Programming.” It uses a quantitative approach based on a mathematical formalism called the Polyhedral Model.
The polyhedral model provides, for an important and precisely defined class of computations, abasis for (i)   describing/formalizing them, (ii) analyzing them using quantitative performance metrics, and (iii) transforming them to produce efficient parallelization.
Topics to be covered
  • Parallel computing using GPUs
  • Performance analysis
  • Performance tuning

Teaching Faculty  Prof. Sanjay Rajopadhye Colorado State University 
Course details   
Dates: March 05-09, 2018 (5 days)

The course is organized as five 4-hour modules, to be delivered as a sequence of lectures and a set of hands-on lab sessions.   Each module will also include associated assignments and tutorials to be completed before the start of the next module (there is some flexibility, depending on constraints of the local coordinator).
The modules will cover the following topics::
[1 module] GPU programming in CUDA
[2 modules] Optimizing and tuning GPU programs
[2 modules] Advanced optimization: tiling, bottleneck analysis (roofline)

Course Schedule:  
Day 1:
Getting Started with GPUs and CUDA. Reductions/Scans (with commutative and non-commutative) operators.

Day 2: CUDA next steps (matrix multiplication and its tuning)

Day 3: Tuning/optimization: bank conflicts, (more matrix multiplication) 

Day 4:
A full kernel: Back Propagation Learning   

Day 5:
Advanced Topics, tiling, autotuning (0/1 Knapsack problem)

Registration Details  

Who can attend:

  • Students at all levels (B.Tech/M.Tech/Ph.D.) and faculty from academic institutions 

  • Engineers and researchers of both public and private organizations   

Registration fee:  

  • Students from academic institutions: Rs. 1000 

  • Faculty from academic institutions: Rs. 10000

  • Professionals from industry & research organizations: Rs. 30000 

  • Any participant from abroad: USD 500   

The registration fee includes all instructional material, computer use for tutorials, and free Internet facility at the time of course lectures and tutorials.    
The participants who opt for accommodation in IITH will be provided the same at the IITH guest house/hostels on payment basis.  

Registration Process:

 Pay course registration fee and complete registration for the course in one of two ways:

  • Electronic Fund Transfer : Name of the Bank: State Bank of India, IITKandi, Hyderabad, India. Branch code: 014182 SWIFT Code:SBIN0014182 (Within India) Account No.:30859878032 (Current A/c) Remittance from abroad using SWIFT code SBININBB762, IMCR CODE:502002528
  • DD in favor of Registrar, IIT Hyderabad, Payable to SBI, IIT KandiBranch, IFS Code: SBIN0014182.
  • The DD, a copy of ID proof issued by the organization mentioned in the registrationform, together with registration form should be sent to the course coordinatoraddress mentioned overleaf.


Last date for registration is Feburary 23, 2018 and acceptance is on first come, first serve basis.  
Faculty profiles can be found here
Course Co-ordinator:
Dr. Ramakrishna Upadrasta
Department of Computer Science and Engineering, IIT Hyderabad
Kandi, Sangareddy. 502 285
Telangana, India
Tel: (+91) 40-23018445