Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Optimizing GPU Program and Architecture for Graph Processing and Convolutional Neural Network

Title
Optimizing GPU Program and Architecture for Graph Processing and Convolutional Neural Network
Authors
박현선
Date Issued
2017
Publisher
포항공과대학교
Abstract
Large scale data processing is becoming more important field, because of increasing the amount of information. For such workloads, GPU is considered as an efficient execution platform by massively parallel computation and large amount of data. In our investigation, existing GPU programming methods for large scale data processing do not fully exploit high memory bandwidth as well as high computing power in GPU. So, in this dissertation, we proposed the optimization method for graph computation and deep convolutional neural networks (CNNs), those are representative large scale data processing, considering the hardware characteristic of GPU. To optimize the graph computation, we propose a novel optimization called locality-aware vertex scheduling, which aims at minimizing memory requests by adjusting the order of vertex computations to improve temporal locality of vertex data stored in on-chip caches. Experiments with nine real-world graphs and three graph algorithms on the recent GPU platform show that the proposed method offers a significant speedup (average 46%) over the state-of-the-art graph algorithm implementation on GPUs. Convolution operations dominate the total execution time of CNNs. Thus, we aim at enhancing the performance of the state-of-the-art convolution algorithm (called Winograd convolution) on the GPU. Our work is based on two observations: (1) the performance benefit of Winograd convolution is limited mainly due to extra additions incurred during data transformation and (2) CNNs often have abundant zero weights. To leverage the second observation, we present data reuse optimization for addition operations in Winograd convolution (called AddOpt), which improves the utilization of local registers, thereby reducing on-chip cache accesses. In addition, to exploit abundant zero weights, we propose a low-overhead and efficient hardware mechanism that skips multiplications that will always give zero results regardless of input data (called ZeroSkip). Our experiments with a real-world deep CNN, VGG-16, on GPGPU-Sim and Titan X show that the proposed methods, ZeroSkip and AddOpt, achieve 51.8% higher convolution performance than the baseline Winograd convolution. Moreover, even without any hardware modification, AddOpt alone gives 35.6% higher performance on a real hardware platform, Titan X.
URI
http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002327721
https://oasis.postech.ac.kr/handle/2014.oak/93303
Article Type
Thesis
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse