Design and Analysis of Soft-Error Resilience Mechanisms for GPU Register File

Sparsh Mittal Haonan Wang Adwait Jog Jeffrey S Vetter
Abstract: Modern graphics processing units (GPUs) are using increasingly larger register file (RF) which occupies a large fraction of GPU core area and is very frequently accessed. This makes RF vulnerable to soft-errors (SE). In this paper, we present two techniques for improving SE resilience of GPU RF. First, we propose compressing the RF values for reducing the number of vulnerable bits. We leverage value similarity and the presence of narrow-width values to perform compression at warp or thread-level, respectively. Second, ...