Energy-Efficient Data Compression for GPU Memory Systems

Gennady Pekhimenko, Evgeny Bolotin, Mike O'Connor, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler

International Conference on Architectural Support for Programming Languages and Operating Systems, Student Research Competition, Istanbul, Turkey, March 2015

 

Abstract

Modern data-intensive computing forces system designers to deliver good performance under several major constraints: limited system power/energy budget (power wall [6]), high memory latency (memory wall [22]), and on-chip/off-chip memory bandwidth (bandwidth wall [15]). Multiple different techniques were proposed to address these issues, but, unfortunately, these techniques usually offer a trade-off: improving one constraint at the cost of another one. Ideally, system designers would like to improve one or several of the system parameters, e.g., on-chip/off-chip bandwidth consumption, with minimal (if any) negative impact on other key parameters. One potential way of achieving this is hardwarebased data compression mechanisms [23, 1, 8, 14, 4], and more specifically, bandwidth or link compression. Data compression exploits high data redundancy observed in many modern applications [1, 14, 16, 4], and can be used to improve both the capacity (e.g., caches, DRAMs, non-volative memories [23, 1, 8, 14, 4, 13, 18] and bandwidth utilization of the interconnects (e.g., on-chip and off-chip buses [17, 13, 18]). Several recent works [17, 13, 18, 3, 21] apply data compression to decrease memory traffic by sending/receiving data in a compressed form for both CPUs [13, 21, 3], and GPUs [17, 12] that result in better system performance and/or energy consumption. Bandwidth compression is especially effective for GPU applications [17, 12] where the limited main memory bandwidth usually becomes the major bottleneck to achieve high performance [10], and there is also significant redundancy in transferred data [17, 12].

 

Manuscript

Pdf

 

Bibtex

Bib