Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 2. Conference Papers

Conference

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems

Title: Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems

Authors: HSIEH, KEVIN; EBRAHIMI, EIMAN; KIM, GWANGSUN; CHATTERJEE, NILADRISH; O'CONNOR, MIKE; VIJAYKUMAR, NANDITA; MUTLU, ONUR; KECKLER, STEPHEN W.

Date Issued: 2016-06-20

Publisher: IEEE

Abstract: Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to significantly alleviate this bottleneck by directly connecting a logic layer to the DRAM layers with high bandwidth connections. Recent work has shown promising potential performance benefits from an architecture that connects multiple such 3D-stacked memories and offloads bandwidth-intensive computations to a GPU in each of the logic layers. An unsolved key challenge in such a system is how to enable computation offloading and data mapping to multiple 3D-stacked memories without burdening the programmer such that any application can transparently benefit from near-data processing capabilities in the logic layer. Our paper develops two new mechanisms to address this key challenge. First, a compiler-based technique that automatically identifies code to offload to a logic-layer GPU based on a simple cost-benefit analysis. Second, a software/hardware cooperative mechanism that predicts which memory pages will be accessed by offloaded code, and places those pages in the memory stack closest to the offloaded code, to minimize off-chip bandwidth consumption. We call the combination of these two programmer-transparent mechanisms TOM: Transparent Offloading and Mapping. Our extensive evaluations across a variety of modern memory-intensive GPU workloads show that, without requiring any program modification, TOM significantly improves performance (by 30% on average, and up to 76%) compared to a baseline GPU system that cannot offload computation to 3D-stacked memories.

URI: https://oasis.postech.ac.kr/handle/2014.oak/94454

Article Type: Conference

Citation: 2016 43rd International Symposium on Computer Architecture, page. 204 - 216, 2016-06-20

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse