Open Access System for Information Sharing

Login Library

 

Conference
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs

Title
Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs
Authors
KIM, GWANGSUNJEONG, JIYUNKIM, JOHNSTEPHENSON, MARK
Date Issued
2016-09-11
Publisher
IEEE
Abstract
Execution of GPGPU workloads consists of different stages including data I/O on the CPU, memory copy between the CPU and GPU, and kernel execution. While GPU can remain idle during I/O and memory copy, prior work has shown that overlapping data movement (I/O and memory copies) with kernel execution can improve performance. However, when there are multiple dependent kernels, the execution of the kernels is serialized and the benefit of overlapping data movement can be limited. In order to improve the performance of workloads that have multiple dependent kernels, we propose to automatically overlap the execution of kernels by exploiting implicit pipeline parallelism. We first propose Coarse-grained Reference Counting-based Scoreboarding (CRCS) to guarantee correctness during overlapped execution of multiple kernels. However, CRCS alone does not necessarily improve overall performance if the thread blocks (or CTAs) are scheduled sequentially. Thus, we propose an alternative CTA scheduler - Pipeline Parallelism-aware CTA Scheduler (PPCS) that takes available pipeline parallelism into account in CTA scheduling to maximize pipeline parallelism and improve overall performance. Our evaluation results show that the proposed mechanisms can improve performance by up to 67% (33% on average). To the best of our knowledge, this is one of the first work that enables overlapped execution of multiple dependent kernels without any kernel modification or explicitly expressing dependency by the programmer.
URI
https://oasis.postech.ac.kr/handle/2014.oak/94451
Article Type
Conference
Citation
2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), 2016-09-11
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse