Open Access System for Information Sharing

Department of Electrical Engineering (전자전기공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Optimizing GPU Program and Architecture for Graph Processing and Convolutional Neural Network

Title: Optimizing GPU Program and Architecture for Graph Processing and Convolutional Neural Network

Authors: 박현선

Date Issued: 2017

Publisher: 포항공과대학교

Abstract: Large scale data processing is becoming more important field, because of increasing the amount of information. For such workloads, GPU is considered as an efficient execution platform by massively parallel computation and large amount of data. In our investigation, existing GPU programming methods for large scale data processing do not fully exploit high memory bandwidth as well as high computing power in GPU. So, in this dissertation, we proposed the optimization method for graph computation and deep convolutional neural networks (CNNs), those are representative large scale data processing, considering the hardware characteristic of GPU. To optimize the graph computation, we propose a novel optimization called locality-aware vertex scheduling, which aims at minimizing memory requests by adjusting the order of vertex computations to improve temporal locality of vertex data stored in on-chip caches. Experiments with nine real-world graphs and three graph algorithms on the recent GPU platform show that the proposed method offers a significant speedup (average 46%) over the state-of-the-art graph algorithm implementation on GPUs. Convolution operations dominate the total execution time of CNNs. Thus, we aim at enhancing the performance of the state-of-the-art convolution algorithm (called Winograd convolution) on the GPU. Our work is based on two observations: (1) the performance benefit of Winograd convolution is limited mainly due to extra additions incurred during data transformation and (2) CNNs often have abundant zero weights. To leverage the second observation, we present data reuse optimization for addition operations in Winograd convolution (called AddOpt), which improves the utilization of local registers, thereby reducing on-chip cache accesses. In addition, to exploit abundant zero weights, we propose a low-overhead and efficient hardware mechanism that skips multiplications that will always give zero results regardless of input data (called ZeroSkip). Our experiments with a real-world deep CNN, VGG-16, on GPGPU-Sim and Titan X show that the proposed methods, ZeroSkip and AddOpt, achieve 51.8% higher convolution performance than the baseline Winograd convolution. Moreover, even without any hardware modification, AddOpt alone gives 35.6% higher performance on a real hardware platform, Titan X.

URI: http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002327721
https://oasis.postech.ac.kr/handle/2014.oak/93303

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Electrical Engineering (전자전기공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse