Energy-Efficient Data Compression for GPU Memory Systems
International Conference on Architectural Support for Programming Languages and Operating Systems, Student Research Competition, Istanbul, Turkey, March 2015
Abstract
Modern data-intensive computing forces system designers to deliver good performance under several major constraints: limited system power/energy budget (power wall [6]), high memory latency (memory wall [22]), and on-chip/off-chip memory bandwidth (bandwidth wall [15]). Multiple different techniques were proposed to address these issues, but, unfortunately, these techniques usually offer a trade-off: improving one constraint at the cost of another one. Ideally, system designers would like to improve one or several of the system parameters, e.g., on-chip/off-chip bandwidth consumption, with minimal (if any) negative impact on other key parameters. One potential way of achieving this is hardwarebased data compression mechanisms [23, 1, 8, 14, 4], and more specifically, bandwidth or link compression. Data compression exploits high data redundancy observed in many modern applications [1, 14, 16, 4], and can be used to improve both the capacity (e.g., caches, DRAMs, non-volative memories [23, 1, 8, 14, 4, 13, 18] and bandwidth utilization of the interconnects (e.g., on-chip and off-chip buses [17, 13, 18]). Several recent works [17, 13, 18, 3, 21] apply data compression to decrease memory traffic by sending/receiving data in a compressed form for both CPUs [13, 21, 3], and GPUs [17, 12] that result in better system performance and/or energy consumption. Bandwidth compression is especially effective for GPU applications [17, 12] where the limited main memory bandwidth usually becomes the major bottleneck to achieve high performance [10], and there is also significant redundancy in transferred data [17, 12].
Manuscript

Bibtex
